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Abstract 

This  research  considers  the  efficient  numerical  solution  of  linearly  constrained  mixed 
variable  programming  (MVP)  problems,  in  which  the  objective  function  is  a  "black  box" 
stochastic  simulation,  function  evaluations  may  be  computationally  expensive,  and  deriv¬ 
ative  information  is  typically  not  available.  MVP  problems  are  those  with  a  mixture  of 
continuous,  integer,  and  categorical  variables,  the  latter  of  which  may  take  on  values  only 
from  a  predefined  list  and  may  even  be  non-numeric.  Mixed  Variable  Generalized  Pattern 
Search  with  Ranking  and  Selection  (MGPS-RS)  is  the  only  existing,  provably  convergent 
algorithm  that  can  be  applied  to  this  class  of  problems.  Present  in  this  algorithm  is  an 
optional  framework  for  constructing  and  managing  less  expensive  surrogate  functions  as  a 
means  to  reduce  the  number  of  true  function  evaluations  that  are  required  to  find  approx¬ 
imate  solutions. 

In  this  research,  the  NOMADm  software  package,  an  implementation  of  pattern  search 
for  deterministic  MVP  problems,  is  modified  to  incorporate  a  sequential  selection  with 
memory  (SSM)  ranking  and  selection  procedure  for  handling  stochastic  problems.  In 
doing  so,  the  underlying  algorithm  is  modified  to  make  the  application  of  surrogates  more 
efficient.  A  second  class  of  surrogates  based  on  the  Nadaraya- Watson  kernel  regression 
estimator  is  also  added  to  the  software.  Preliminary  computational  testing  of  the  modified 
software  is  performed  to  characterize  the  relative  efficiency  of  selected  surrogate  functions 
for  mixed  variable  optimization  in  simulated  systems 
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ON  THE  USE  OF  SURROGATE  FUNCTIONS  FOR  MIXED 
VARIABLE  OPTIMIZATION  OF  SIMULATED  SYSTEMS 

Chapter  1  -  Introduction 

1 . 1  Problem  Setting 

Consider  the  problem  of  optimizing  a  real  system  in  which  the  objective  is  to  find  a 
set  of  controllable  parameters  that  minimizes  some  measure  of  performance.  To  validate 
the  results,  direct  experimentation  on  the  system  is  preferred,  but  is  not  always  possible, 
due  to  technological  or  cost  restrictions.  In  this  case,  a  common  practice  is  to  create  a 
mathematical  model  of  the  system  in  terms  of  logical  and  quantitative  relationships  [33] . 
For  a  given  performance  measure  with  a  defined  level  of  abstraction,  a  valid  mathematical 
model  can  usually  be  derived  which  accurately  represents  the  behavior  of  the  actual  system. 
If  the  mathematical  model  is  simple  enough,  then  an  analytic  optimal  solution  may  exist 
in  closed  form.  However,  for  more  complex  models,  simulated  mathematical  models  (often 
referred  to  as  simulation  models )  can  produce  representative  outputs  of  the  real  system. 
The  focus  of  this  research  is  on  simulation-based  models. 

Due  in  part  to  the  ease  of  changing  simulation  configurations,  simulation  models  have 
become  popular  in  performing  such  tasks  as  characterizing  system  behavior,  aiding  in  the 
design  of  real  systems,  and  examining  alternatives  through  what-if  scenarios  to  improve 
the  related  system  performance.  For  example,  simulation  models  are  routinely  constructed 
to  determine  costs  associated  with  manufacturing,  warehousing,  ordering,  and  shipping  of 
goods  under  different  production  disciplines.  If  the  system  constraints  and  performance 
measure(s)  of  interest  are  modeled  in  the  simulation,  then  the  simulation  responses  can  be 
used  with  an  optimization  algorithm  to  determine  production  strategies  that  optimize  a 
performance  measure  for  the  underlying  system.  The  resulting  model  solution  can  thus 
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Figure  1.1.  Simulation-Based  Optimization  Method 
assist  decision  makers  in  developing  proper  courses  of  action  by  augmenting  their  knowledge 
of  the  actual  system. 

The  term  simulation-based  optimization  describes  methodologies  where  complex  sys¬ 
tems  are  "designed,  analyzed,  and  controlled  by  optimizing  the  results  of  computer  simu¬ 
lations"  (Kolda  et  al.  [31]).  As  illustrated  in  Figure  1.1,  the  complex  mathematical  model 
representing  the  system  is  simulated  (usually  repeatedly)  at  a  design  point  specified  by  an 
optimization  algorithm,  generating  a  response  that  is  an  estimate  of  the  system’s  perfor¬ 
mance  measure.  In  this  context,  simulation  is  defined  as  a  numerical  procedure  that  takes 
a  design  vector  of  inputs  and  generates  a  response  based  on  underlying  model  relationships. 
For  a  given  level  of  abstraction,  a  valid  model  can  represent  the  system’s  set  of  controllable 
parameters  as  an  input  design  vector.  Using  the  abstracted  design  vector  as  input,  the 
model  generates  an  estimate  of  the  system’s  performance  measure.  The  resulting  model 
response  is  then  used  by  an  optimization  algorithm  to  introduce  a  new  set  of  input  design 
points  to  be  evaluated  by  the  model.  The  optimization  algorithm  thus  acts  on  the  model, 
rather  than  on  the  actual  system,  in  order  to  optimize  the  performance  measure.  Under 
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the  assumption  of  model  validity,  the  set  of  controllable  system  parameters  that  optimize 
actual  system  performance  will  directly  correspond  to  the  resulting  optimal  design  vector. 

Although  both  cases  are  applicable  to  this  research,  it  is  important  to  note  that  these 
simulation  models  can  be  classified  into  two  groups,  depending  on  the  presence  of  random 
(or  probabilistic)  components  in  the  model.  Simulation  models  that  do  not  contain  random 
components  are  called  deterministic  models.  Deterministic  models  do  not  include  random 
components;  they  explicitly  represent  the  underlying  system,  usually  by  the  use  of  schedules 
or  differential  equations.  Although  these  models  are  generally  analytically  intractable,  the 
model  needs  to  be  run  only  once  at  a  particular  design  vector  to  produce  a  unique  response 
of  the  associated  system  performance. 

Simulation  models  which  contain  random  components  are  called  stochastic  models. 
When  modeling  the  actual  system  of  interest,  random  components  are  often  used  to  rep¬ 
resent  the  uncertainty  of  the  input  parameters  in  the  model.  Although  these  simulation 
models  are  better  able  to  account  for  the  uncertainty,  random  components  produce  ran¬ 
dom  noise  in  the  evaluation  of  the  response  function;  thus,  the  outputs  from  stochastic 
models  are  often  referred  to  as  random  responses.  To  reduce  the  error  in  estimating  the 
true  response,  repeated  sampling  of  a  particular  design  vector  is  traditionally  performed  to 
generate  a  distribution  of  the  associated  system  performance.  Although  a  response  distri¬ 
bution  can  be  generated  from  repeated  sampling,  in  this  research  the  simulation  model  will 
be  assumed  to  have  a  terminating  condition  that  produces  a  single  response.  Therefore, 
approaches  used  in  non-terminating  simulation  models  to  estimate  a  performance  measure, 
such  as  replication/deletion  and  batching  means  (see  Law  and  Kelton  [33]),  are  not  required. 

Random  noise  in  the  evaluation  of  the  responses  from  stochastic  simulation  models 
requires  special  attention  when  estimating  the  performance  measure.  In  the  case  of  de- 
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terministic  models,  the  simulation  response  F(x)  is  a  noise-free  estimator  of  the  system 
performance  f(x),  which  is  dependent  only  upon  the  design  vector  x\  thus  expected  per¬ 
formance  is  given  by 

/(*)  =  E[F(x)]  (1.1) 

where  x  €  Mn  is  the  design  vector.  In  the  case  of  stochastic  models,  the  response  of  the 
simulation  F(x,ip)  depends  not  only  on  the  design  vector  x,  but  also  on  the  random  noise 
of  the  system  i)r,  thus  expected  performance  is  given  by 

f(x)  =  E[F(x,  VO]  =  f  F(x,  VOW)  (1.2) 

Jn 

where  x  is  the  design  vector,  if  €  4/  is  an  element  of  an  underlying  probability  space 
(f&,F,P)  with  sample  space  4/,  sigma-field  T .  and  probability  measure  P.  Because  the 
responses  are  generated  from  a  “black  box”  system  and  the  simulations  are  analytically 
intractable,  the  probability  distribution  that  defines  F(x,  VO  is  assumed  to  be  unknown  but 
can  be  sampled.  In  both  deterministic  and  stochastic  models,  the  simulation  responses  F( ■) 
provide  an  estimate  of  the  system  performance  f(x).  Therefore,  the  objective  function  f 
of  the  underlying  system  is  optimized  by  systematically  improving  the  simulation  response 
function  F. 

In  order  to  optimize  the  underlying  system,  the  simulation  response  can  be  systemat¬ 
ically  improved  by  the  use  of  an  appropriate  optimization  algorithm.  However,  the  choice 
of  an  algorithm  is  restricted  by  certain  assumptions.  In  this  research,  the  simulation  re¬ 
sponse  is  assumed  to  originate  from  a  “black  box” ;  thus  analytical  derivatives  are  typically 
not  available.  Furthermore,  the  responses  may  have  few  correct  digits;  therefore,  approxi¬ 
mation  of  the  gradient  by  techniques  such  as  finite  differencing  or  simultaneous  perturba¬ 
tion  (see  Spall  [54])  may  be  unreliable.  The  applicable  methods  for  this  research  can  be 
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further  restricted  by  allowing  objective  or  constraint  functions  to  be  discontinuous,  have 
undefined  regions,  or  fail  to  return  a  solution  at  certain  design  points.  Because  of  these 
considerations,  gradient-based  optimization  algorithms  cannot  be  effectively  applied. 

The  selection  of  an  applicable  optimization  algorithm  is  also  restricted  by  the  presence 
of  categorical  variables ,  which  are  those  that  can  only  take  on  values  from  a  predefined  list, 
and  may  have  no  ordinal  relationship  to  one  another.  This  restriction  is  common  in  the 
structure  of  many  real-life  systems.  For  example,  a  structural  design  problem  may  include 
the  type  of  material  as  a  variable  ( e.g .  steel,  aluminum,  etc.),  a  production  system  may 
use  different  processing  disciplines  for  a  particular  machine  (first- in- first-out,  last-in- first- 
out,  priority,  etc.),  or  a  networking  problem  may  be  constructed  with  decision  variables  to 
indicate  if  a  particular  supply  point  is  present  (yes,  no).  These  categorical  variables  can 
be  represented  by  integers,  but  the  values  usually  have  no  inherent  ordering  (e.g.  1  =  steel, 
2  =  aluminum,  etc.  for  the  types  of  material).  A  further  complication  is  that  changes  to 
categorical  variable  values  may  change  the  constraints  of  the  problem  and  even  the  number 
of  continuous  variables.  The  class  of  optimization  problems  that  includes  continuous, 
integer- valued,  and  categorical  variables  is  known  as  mixed  variable  programming  (MVP) 
(Audet  and  Dennis  [5]). 

MVP  problems  with  only  a  few  categorical  variables  that  can  assume  only  a  small 
number  of  values  are  usually  not  difficult  to  solve.  In  these  problems,  categorical  settings 
can  be  exhaustively  enumerated  to  produce  subproblems  which  are  solved  and  compared 
to  determine  the  optimal  setting.  However,  real-life  systems  may  involve  a  large  number 
of  discrete  variable  combinations.  Since  repeated  simulation  model  evaluations  would 
be  required  to  solve  each  of  the  subproblems  at  each  categorical  setting  combination,  this 
approach  may  be  too  computationally  expensive  to  use  as  a  solution  approach.  Because  the 
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values  of  the  categorical  variables  are  assigned  from  an  unordered,  predefined  list,  relaxation 
techniques  used  for  traditional  integer  programming  problems  (such  as  branch  and  bound) 
are  not  applicable.  Relaxation  techniques  are  able  to  avoid  exhaustive  enumeration  in 
integer  programming  problems  by  exploiting  the  ordinality  of  the  integer-valued  variables; 
but  in  MVP  problems,  the  categorical  variables  must  be  assigned  a  value  from  the  list  at 
every  iteration,  not  just  at  the  solution. 

This  research  considers  the  optimization  of  simulation  problems  with  mixed  variables, 
where  the  continuous  variables  are  bounded  by  linear  constraints.  This  class  of  MVP 
problems  can  be  expressed  as 


min  f(x)  (1.3) 

where  /  :  0  — >  RU  {+00}  is  a  function  of  unknown  analytical  form  (referred  to  as  the 
objective  function )  and  r  6  O  is  the  vector  of  mixed  design  variables.  The  mixed  variable 
domain  0  is  partitioned  into  continuous  and  discrete  domains  Qc  and  Qd,  respectively,  where 
some  or  all  of  the  discrete  variables  may  be  categorical.  Thus,  each  vector  x  G  can  be 
denoted  as  x  =  ( xc ,  xd )  where  xc  E  Rn°  are  the  continuous  variables  of  dimension  nc  and 
xd  6  Znd  are  the  discrete  variables  of  dimension  nd.  The  domain  of  the  discrete  variables 
nd  C  Zn°  can  be  represented  as  a  subset  of  7Lnd  by  mapping  the  predefined  list  elements 
for  each  categorical  variable  to  integer  values.  The  domain  of  the  continuous  variables 
nc  C  RnC  is  restricted  by  linear  constraints  which  can  be  functions  of  xd.  Since  the  value 
of  the  constraint  bounds  may  be  dependent  on  the  value  of  the  discrete  variables,  for  each 
fixed  set  of  discrete  variable  values  xd  the  associated  constraints  can  be  expressed  as 

nc(xd)  =  {xc  E  Rn‘  :  i{xd)  <  A(xd)xc  <  u(xd)}  (1.4) 
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where  A(xd )  €  MmCxnC;  £(xd),  u(xd)  G  (MU{=too})m  ,  £[xd)  <  u(xd ),  and  mc  <  oo.  For 
convenience,  in  the  remainder  of  this  paper,  the  explicit  dependence  of  Dc  on  xd  as  shown 
in  Equation  (1.4)  is  omitted. 

Due  to  this  inherent  complexity,  few  methods  can  efficiently  solve  MVP  problems. 
One  method  that  can  provide  proven  convergence  to  a  stationary  point  for  linearly  con¬ 
strained  and  continuously  differentiable  MVP  problems  is  the  Audet-Dennis  Mixed  Variable 

Generalized  Pattern  Search  (MGPS)  [5],  which  is  described  in  Section  2.4.1.  This  algo- 

(R) 

rithrn  is  implemented  in  the  NOMADrn  MATLAB^  software  [1].  NOMADrn  is  unique  in 
that  it  not  only  provides  an  implementation  of  MGPS,  but  it  also  handles  nonlinear  con¬ 
straints  through  a  filter  approach  for  infeasible  iterates  (see  Audet  and  Dennis  [8] ) .  Since 
NOMADrn  is  only  applied  in  this  research  to  problems  with  linear  constraints,  the  filter 
approach  is  avoided. 

Included  in  the  MGPS  algorithm  is  an  optional  SEARCH  step  at  each  iteration,  in  which 
the  user  may  attempt  any  finite  strategy  to  reduce  computational  cost.  As  noted  by  Booker 
et  al.  [10] ,  inexpensive  surrogate  functions  can  be  used  within  the  SEARCH  step  to  accelerate 
convergence  to  a  solution  without  affecting  the  overall  convergence  theory.  If  the  surrogate 
is  reasonably  accurate,  then  the  search  may  advance  the  optimization  algorithm  to  good 
solutions  with  fewer  function  evaluations  than  without  a  SEARCH  step.  Since  improvement 
can  be  gained  through  inexpensive  surrogates,  Sriver  [55]  used  a  surrogate-based  search  in  a 
computational  evaluation  of  the  underlying  pattern  search  ranking  and  selection  algorithm 
used  in  this  research.  Sriver  [55]  also  suggested  that  future  work  include  the  examination 
of  surrogate  families  for  stochastic  pattern  search  methods,  which  this  research  does  in  the 
context  of  stochastic  MVP  problems. 
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1 . 2  Approach 

This  research  focuses  on  improving  the  MGPS  algorithm  for  MVP  problems  used  in 
NOMADrn  by: 

1.  Allowing  for  the  use  of  "black  box"  simulation  models  with  stochastic  responses,  and 

2.  Generalizing  the  MGPS  algorithm  for  improved  use  of  surrogates. 

Allowing  for  stochastic  simulation  responses  enables  NOMADrn  to  become  applicable 
to  a  wider  set  of  problem  types.  Generalizing  the  algorithm  can  reduce  the  number  of  func¬ 
tion  evaluations  required  for  convergence  to  a  stationary  point;  therefore,  computational 
efficiency  may  be  improved. 

1.2.1  Use  of  Stochastic  Simulation  Models 

As  part  of  the  MGPS  algorithm,  the  incumbent  best  solution  is  only  updated  when 
a  trial  point  provides  a  better  response.  When  deterministic  simulation  models  are  used, 
this  ensures  monotonic  system  performance  improvement,  since  selection  of  the  incumbent 
is  accomplished  by  comparison  of  deterministic  system  responses.  However,  when  sto¬ 
chastic  simulation  models  are  used,  the  randomness  of  system  responses  complicates  the 
selection  process.  To  achieve  a  given  statistical  level  of  confidence  that  the  selected  iterate 
provides  true  system  performance  improvement,  multiple  replications  may  be  required  to 
obtain  enough  observations  to  be  confident  that  the  incumbent  selected  is  the  true  "best" 
iterate  (see  Pichitlamken  and  Nelson  [48]).  Since  replications  require  additional  function 
evaluations,  stochastic  simulation  models  are  often  expensive  to  optimize.  Using  the  simple 
statistical  technique  of  pairwise  comparison,  the  number  of  function  evaluations  required  to 
solve  the  iterate  selection  subproblem  has  been  previously  considered  too  computationally 
inefficient  to  implement  in  the  current  optimization  algorithm  (see  Trosset  [60]).  Recent 
developments  in  ranking  and  selection  (R&S)  statistical  methods  offer  the  ability  to  use 


multiple  comparisons  to  solve  the  iterate  selection  subproblem  in  a  more  efficient  manner 
than  pairwise  comparison.  Sriver  demonstrated  in  [56]  that  the  use  of  ranking  and  selec¬ 
tion  procedures  for  the  iterate  selection  subproblem  can  provide  an  efficient  selection  of  the 
incumbent  while  maintaining  the  required  level  of  statistical  significance.  Thus,  an  R&S 
procedure  is  used  to  control  the  selection  of  the  incumbent. 

1.2.2  Generalizing  the  Algorithm 

As  noted  in  the  previous  subsection,  the  current  MGPS  algorithm  is  able  to  provide 
monotonic  improvement  by  allowing  an  incumbent  to  be  replaced  only  by  an  iterate  which 
provides  a  better  response.  Response  improvement  alone,  however,  does  not  guarantee 
convergence  to  a  stationary  point  for  continuously  differentiable  functions  with  linear  con¬ 
straints.  The  MGPS  algorithm  further  restricts  the  candidate  iterates  to  meet  polling  con¬ 
ditions,  described  in  Section  2.4.1,  in  order  to  meet  underlying  convergence  theory.  The 
MGPS  algorithm  evaluates  points  on  a  mesh  and  includes  an  optional  SEARCH  step  where 
"a  finite  search,  free  of  any  other  rules  imposed  by  the  algorithm,  is  performed  anywhere 
on  the  mesh"  (Audet  and  Dennis  [5]),  which  is  described  in  Section  3.1.  The  flexibility  of 
the  SEARCH  step  allows  the  user  to  employ  any  strategy  favorable  to  the  user,  as  long  as  it 
searches  finitely  many  points  (including  none).  Since  the  SEARCH  step  is  optional,  the  user 
can  fully  personalize  the  SEARCH  step  with  the  understanding  that  underlying  convergence 
theory  is  provided  by  the  other  steps  of  the  algorithm. 

Not  only  does  the  current  MGPS  algorithm  allow  for  only  one  SEARCH  step  per  it¬ 
eration,  but  the  search  is  restricted  to  the  discrete  plane  (i.e.,  same  categorical  variable 
settings)  of  the  incumbent.  This  research  explores  the  use  of  an  additional  SEARCH  step 
that  allow  the  user  the  flexibility  to  apply  a  finite  search  on  the  continuous  regions  associ¬ 
ated  with  any  choice  of  the  discrete  variable  values.  This  additional  SEARCH  step  enables 
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a  more  global  search  of  the  feasible  region  that  may  result  in  faster  convergence  to  a  sta¬ 
tionary  point  or  an  improved  solution  at  the  termination  of  the  algorithm.  For  problems 
with  a  small  number  of  discrete  planes,  the  additional  SEARCH  step  may  coincide  with  the 
current  SEARCH  step  and  thus  may  not  benefit  the  algorithm  significantly.  However,  for 
problems  with  a  large  number  of  categorical  setting  combinations,  the  additional  SEARCH 
step  enables  the  algorithm  to  update  the  incumbent  with  any  iterate  that  provides  a  bet¬ 
ter  response  before  polling  is  performed.  Therefore,  the  reduction  in  function  evaluations 
may  be  more  significant  in  problems  with  a  large  number  of  categorical  variables. 

1 . 3  Summary 

Because  of  the  generality  of  the  targeted  class  of  optimization  problems  and  the  rel¬ 
atively  weak  assumptions,  very  few  methods  are  provably  convergent  for  MVP  problems. 
A  derivative-free  approach,  which  has  been  shown  to  be  convergent  for  MVP  problems,  is 
the  mixed  variable  generalized  pattern  search  (MGPS)  algorithm.  The  purpose  of  this 
research  is  to  develop  an  algorithm  that  can  be  used  to  efficiently  solve  simulation-based 
optimization  problems.  The  research  goal  is  to  improve  the  efficiency  of  the  MGPS  al¬ 
gorithm  proposed  by  Sriver,  extend  the  applicability  of  NOMADrn  to  stochastic,  as  well 
as  deterministic,  simulations,  and  perform  testing  of  appropriate  surrogates.  Specifically, 
the  ranking  and  selection  method  is  used  to  control  the  iterate  selection  subproblem  within 
NOMADrn,  which  when  combined  with  an  additional  SEARCH  step  and  surrogate  functions, 
increases  the  applicability  and  computational  efficiency  of  NOMADrn.  This  approach  is 
applied  to  stochastic  and  deterministic,  continuous  and  mixed  variable  programming  prob¬ 
lems,  and  performance  is  compared  to  that  of  the  native  NOMADrn  performance  in  terms 
of  the  quality  of  the  terminating  solution  and  computational  efficiency. 
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1 .Jf.  Overview 

The  next  chapter  describes  pertinent  background  information  found  in  the  literature. 
Chapter  3  describes  the  specific  ranking  and  selection  procedure  as  well  as  the  surrogate 
functions  used  in  this  research  and  how  they  were  integrated  into  NOMADrn.  Chapter 
4  gives  comparative  results  of  this  implementation  against  the  algorithm  currently  used 
in  NOMADrn  as  applied  to  nonlinear  programming  problems  described  in  Equation  (1.3). 
Finally,  Chapter  5  gives  conclusions  and  recommendations  for  areas  of  further  research. 
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Chapter  2  -  Literature  Review 

The  class  of  problems  that  can  be  represented  as  a  simulation-based  MVP  structure 
is  so  general  that  a  controlled  approach  must  be  used  in  developing  a  provably  convergent, 
computationally  efficient  optimization  algorithm.  Before  addressing  modifications  to  the 
current  algorithms,  a  review  of  the  development  and  characteristics  of  the  methods  used 
to  solve  related  problems  is  required.  In  this  chapter,  the  general  optimization  algorithm 
is  shown  to  be  tailorable  to  the  availability  of  accurate  problem  information,  with  the 
frequently  used  Newton-based  methods  discussed  as  an  example.  Direct  search  methods 
are  then  discussed  with  regard  to  their  applicability  to  this  research.  Finally,  an  overview 
of  ranking  and  selection  procedures  and  surrogates  are  discussed. 

2.1  Unconstrained  Optimization  Algorithms 

Consider  the  problem  of  minimizing  an  unconstrained  continuously  differentiable  func¬ 
tion  /  :  Mn  — >  M, 


min/(x)  (2-1) 

X 

The  selection  of  an  algorithm  for  solving  the  problem  in  Equation  (2.1)  is  usually  depen¬ 
dent  on  the  relative  importance  of  computational  efficiency  and  proven  convergence  to  a 
stationary  point  of  /. 

A  basic  optimization  algorithm  for  solving  the  problem  in  Equation  (2.1)  is  given 
in  Figure  2.1.  By  simply  requiring  that  the  objective  function  decreases  during  each 
iteration,  the  algorithm  can  produce  a  sequence  of  iterates  that  improves  the  objective 
function  value.  If  derivative  information  is  available,  a  common  practice  is  the  tailoring 
of  the  basic  optimization  algorithm  into  a  Newton-based  method.  By  placing  restrictions 
on  the  search  direction  pk  and  step  size  ak  at  each  iteration,  Newton-based  methods  can, 
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General  Optimization  Algorithm 

Initialization:  Choose  a  feasible  starting  point  xq  and  suitable  stopping  criteria. 

Set  the  iteration  counter  k  to  0. 

For  k  =  0, 1, ... 

1.  If  Xk  is  optimal,  stop  and  return  Xk ; 

Otherwise,  calculate  the  search  direction  pk  and  step  size  ak  at  the  current  iteration. 

2.  Compute  an  improved  estimate  of  the  solution: 

•T/c+l  —  d"  CLkPk 

3.  Update  k  =  k  +  1  and  return  to  Step  1. 

Figure  2.1.  General  Optimization  Algorithm  (adapted  from  Nash  and  Sofer  [43]) 
under  mild  conditions,  provide  computationally  efficient,  proven  convergence  to  a  stationary 
point.  Since  Newton-based  methods  are  so  frequently  used  and  provide  the  main  concepts 
for  proven  convergence,  the  following  section  discusses  the  basic  optimization  algorithm 
in  terms  of  a  Newton-based  algorithm.  However,  it  should  be  noted  that  the  Newton- 
based  optimization  algorithms  are  only  one  example  of  the  general  optimization  algorithm 
in  Figure  2.1. 

2.2  Newton-based  Optimization  Algorithms 

Under  ideal  conditions,  Newton’s  method  provides  proven  convergence  to  a  minimizer 
at  a  quadratic  rate,  which,  roughly  speaking,  means  that  the  number  of  correct  digits 
doubles  at  each  iteration  (Nash  and  Sofer  [43]).  For  Newton’s  method,  the  search  direction 
Pk  is  calculated  as  the  solution  of  the  Newton  Equation 

[V2/(xfc)]  pk  =  - V/(xfc )  (2.2) 
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Unfortunately,  the  cost  of  calculating  both  the  gradient  V  f(xk)  and  Hessian  V2/(xfc) 
at  each  iteration  can  be  computationally  expensive,  and  the  Hessian  can  become  ill-conditioned 
(Nocedal  and  Wright  [46]).  Numerous  methods  have  been  developed  which  attempt  to  pre¬ 
serve  the  convergence  rate  while  addressing  these  issues  by  replacing  the  Hessian  with  an 
inexpensive  approximation.  For  quasi-Newton  methods,  the  Hessian  is  replaced  by  a  pos¬ 
itive  definite  approximation  Bk  that  is  obtained  and  updated  at  a  lower  cost.  Usually,  the 
update  methods  that  are  used  to  calculate  Bk  also  have  conditions  to  prevent  the  approxi¬ 
mated  Hessian  from  becoming  too  ill-conditioned  (for  more  details,  see  Nocedal  and  Wright 
[46]).  For  the  steepest  descent  method,  the  Hessian  is  replaced  by  the  identity  matrix  so 
that  pk  =  —V/(xfc).  Although  the  steepest  descent  method  is  able  to  yield  computational 
savings  in  the  calculation  of  the  search  direction,  convergence  may  become  arbitrarily  slow 
(Nash  and  Sofer  [43]). 

2.2.1  Guaranteeing  Convergence 

Although  the  general  algorithm  in  Figure  2.1  includes  those  that  converge  to  a  station¬ 
ary  point,  an  unmodified  algorithm  may  not  make  sufficient  progress  towards  the  point  and, 
in  some  cases,  may  actually  diverge.  In  an  effort  to  enforce  improvement  in  the  solution 
estimate  at  each  iteration,  the  step  size  ak  is  computed  in  the  direction  pk  so  that  simple 
decrease ,  f(xk+ i)  <  /(xfc),  in  the  objective  is  achieved.  That  is,  since  xk+i  =  Xk  +  akpk, 
the  simple  decrease  condition  can  be  written  as 

/(xfc  +  akpk )  <  f{xk)  (2.3) 

Under  mild  assumptions,  requiring  improvement  at  each  iteration  will  keep  the  algorithm 
from  diverging,  but  does  not  ensure  convergence  to  a  stationary  point.  Dennis  and  Schnabel 
[19,  p.  117]  provide  the  following  example  to  illustrate  that  an  algorithm  relying  solely  on 
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a  simple  decrease  condition  may  not  converge  to  a  stationary  point  of  the  objective.  The 
function  f(x)  =  x2  has  only  one  stationary  point  at  x  =  0.  If  the  step  sizes  are  chosen  too 
small  a,k  =  2~fc+1,  then  the  sequence  of  iterates  is  given  by  Xk  =  1  +  2~fc,  which  converges 
to  1.  If  the  step  sizes  are  chosen  too  large  a&  =  2  +  3(2~(fc+1)),  then  the  sequence  is  given 
by  Xk  =  (— l)fc(l  +  2~fe),  with  limit  points  at  ±1.  The  choice  of  search  direction  can  also 
affect  the  convergence  of  the  algorithm.  If  the  search  direction  pk  is  almost  orthogonal  to 
the  direction  of  steepest  descent,  the  algorithm  can  also  stall  (Kolda  et  al.  [31])  at  a  non¬ 
stationary  point.  Therefore,  additional  conditions  must  be  met  for  optimization  algorithms 
to  prove  convergence  for  unconstrained  optimization  problems. 

Optimization  algorithms  generally  use  either  an  embedded  trust  region  or  a  line  search 
methodology  to  meet  conditions  that  prove  convergence  to  a  stationary  point  of  the  objec¬ 
tive.  Since  this  review  is  used  to  support  the  convergence  theory  presented  later  in  the 
document,  only  the  line  search  conditions  are  presented.  For  line  search  methods,  con¬ 
ditions  are  imposed  on  both  the  search  directions  and  the  step  sizes  in  order  to  satisfy 
convergence  conditions.  The  first  condition  required  of  a  search  direction  is  that  the  can¬ 
didate  direction  must  be  a  descent  direction.  A  sufficient  condition  for  pk  to  be  a  descent 
direction  is  that  V/(x*;)  <  0.  This  condition  ensures  that  the  simple  decrease  condi¬ 
tion  in  Equation  (2.3)  will  be  satisfied  for  sufficiently  small  a*,.  The  only  other  condition 
imposed  on  the  search  directions  is  the  sufficient  descent  condition  (Nash  and  Sofer  [43,  p. 
314]),  given  by 


-Pk  V/(zfc) 

llftll  l|V/(**)|| 


>  e  >  0,  k  =  0, 1, . 


(2.4) 


for  some  e  >  0.  This  keeps  the  search  direction  pk  from  approaching  orthogonality  to 
— V/(xfc)  in  the  limit  by  bounding  the  angle  between  the  two.  This  is  more  intuitively 
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seen  if  the  sufficient  descent  condition  is  rewritten  as  cos  6  >  e  >  0,  where  6  is  the  angle 
between  pk  and  —Vf(xk),  often  referred  to  as  the  angle  condition  (Nash  and  Sofer  [43,  p. 
314]). 

Given  these  two  conditions  on  pk,  step  sizes  must  also  be  controlled  to  ensure  conver¬ 
gence  to  a  stationary  point.  The  sufficient  decrease  condition ,  given  by 

f(xk  +  akPk)  <  f(xk )  +  pakplV f{xk)  (2.5) 

where  p  6  (0, 1),  ensures  that  the  step  size  ak  produces  a  decrease  that  is  at  least  as  good 
as  a  fraction  of  what  the  linear  approximation  to  f(xk  +  akpk )  would  produce.  Since  the 
linear  approximation  of  f(xk  +  apk )  is  given  by 

f(xk  +  apk )  «  f(xk)  +  apkV f (xk)  (2.6) 

and  the  descent  direction  necessarily  satisfies  pkV f(xk)  <  0,  the  use  of  the  sufficient  de¬ 
crease  condition  eliminates  the  possible  selection  of  step  sizes  ak  that  are  too  long.  In 
order  to  avoid  step  sizes  that  are  too  short,  a  curvature  condition  must  be  satisfied.  One 
such  condition,  is  given  by 


V/(xfc  -I-  akpk)Tpk  >  rjS7  f  {xk)T pk  (2.7) 

where  ry  is  a  scalar  and  0  <  p  <  rj  <  1.  This  ensures  that  the  slope  at  a  step  size  ak  is  at 
least  g  times  the  slope  at  the  current  point.  The  curvature  condition  eliminates  potential 
choices  of  the  step  size  which  would  produce  slopes  that  are  arbitrarily  close  to  the  gradient 
at  the  current  point,  thereby  ensuring  the  step  size  ak  is  not  too  short.  Equation  (2.5) 
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Backtracking  Line  Search 
Initialization:  Choose  a  >  0,  p  6  (0, 1),  and  0  <  pt  <  pu  <  1. 

1.  If  f(xk  +  apk )  <  f(xk)  +  paS7 f(xk)TPk,  stop  and  return  ak  =  a. 

2.  Compute  a  =  pa  where  p  €  ( pi,pu ). 

Figure  2.2.  Backtracking  Line  Search  (adapted  from  Dennis  and  Schnabel  [19]) 

and  (2.7)  together  are  called  the  Wolfe  conditions.  Alternative  conditions  are  discussed  in 
Nocedal  and  Wright  [46]. 

A  simple  and  convenient  line  search  strategy  is  called  backtracking ,  which  is  detailed 
in  Figure  2.2.  In  backtracking,  the  step  size  is  contracted  iteratively  until  the  sufficient 
decrease  condition  is  satisfied.  Since  pk  is  a  descent  direction,  backtracking  terminates  in 
a  finite  number  of  steps.  Backtracking  also  enforces  the  curvature  condition  automatically 
(for  more  details,  see  Nocedal  and  Wright  [46]). 

2.2.2  Suitability  of  Derivative-based  Methods 

Although  under  ideal  conditions  methods  that  use  derivative  information  ( i.e .  Newton- 
based  methods)  can  provide  quadratic  rates  of  convergence,  the  assumption  that  first  deriv¬ 
ative  information  is  available  (or  can  be  accurately  calculated)  is  not  satisfied  for  all  cases  of 
unconstrained  optimization  problems.  In  particular,  when  complex  simulations  are  used  to 
provide  function  evaluations,  it  may  be  difficult  or  impossible  to  calculate,  or  even  estimate, 
the  derivatives.  For  example,  explicit  derivative  formulas  may  not  exist  for  deterministic 
simulations  that  use  numerical  computation  of  differential  equations  to  determine  function 
evaluations.  Additionally,  approximation  methods  used  on  stochastic  simulations,  such  as 
finite-differencing  or  simultaneous  perturbation  stochastic  approximation  (Spall  [54]),  may 
fail  when  function  values  are  either  only  accurate  to  a  few  digits  or  computationally  pro¬ 
hibitive  in  higher  dimensions.  Therefore,  methods  which  rely  on  the  availability  of  deriv- 
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ative  information  (and  clearly  those  which  depend  on  exact  or  approximate  second-order 
derivative  information,  such  as  quasi-Newton  and  Levenberg-Marquardt  methods)  may  not 
be  rigorously  applicable  when  developing  an  algorithm  to  solve  simulation  optimization 
problems. 

2. 3  Search  Heuristics 

Direct  search  methods  use  direct  comparison  of  functional  responses  to  progress  to  a 
point  that  provides  an  improved  response.  Because  they  do  not  use  derivative  informa¬ 
tion,  direct  search  methods  make  fewer  assumptions  of  the  response  generator  (simulation 
model).  Since  direct  search  methods  only  require  function  responses,  they  are  generally 
more  broadly  applicable  than  Newton-based  methods.  However,  in  exchange  for  general¬ 
ity,  direct  search  methods  may  require  a  large  number  of  function  evaluations  in  order  to 
converge  to  a  solution,  assuming  that  they  converge  at  all. 

As  a  member  of  the  direct  search  methods  class,  search  heuristics  seek  an  acceptable 
improvement  rather  than  a  provably  optimal  solution  by  methodically  searching  the  feasible 
region.  Because  of  their  simplicity  and  ability  to  discover  good  solutions  rapidly  (even 
in  high  dimensional  settings),  search  heuristics  have  become  popular  in  solving  complex 
problems  in  a  variety  of  different  disciplines.  As  representatives  of  search  heuristics,  both 
simulated  annealing  and  evolutionary  algorithms  are  described  in  general  and  as  they  relate 
to  this  research.  Each  is  an  adaptation  of  a  multidisciplinary  technique  originating  from 
an  engineering  discipline. 

2.3.1  Simulated  Annealing 

For  simulated  annealing,  thermodynamic  equilibrium  theory  is  applied  to  the  selection 
of  iterates  which  do  not  satisfy  the  simple  decrease  condition  of  Equation  (2.3).  As  shown 
in  Figure  2.3,  incumbent  replacement  is  based  on  the  objective  function  evaluation  of  the 
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incumbent  and  trial  points.  Trial  points  that  provide  simple  decrease  always  replace  the 
incumbent;  however,  trial  points  that  fail  to  satisfy  this  condition  may  also  replace  the 
incumbent  according  to  a  probability  distribution  that  is  maintained  by  the  algorithm. 
The  probabilistic  selection  offered  by  simulated  annealing  theoretically  allows  the  search 
algorithm  to  escape  areas  of  local  optima  and  actively  search  for  a  global  optimum. 

Annealing  is  the  process  of  heating  and  slowly  cooling  an  object  (typically  glass  or 
metal)  according  to  a  schedule  in  order  to  strengthen  or  harden  the  resulting  product.  For 
optimization  of  systems,  the  object  of  interest  is  represented  by  the  solution  where  strength 
is  measured  as  proximity  to  optimality.  For  complex  optimization  problems,  such  as  steel 
production  or  crystal  formation,  the  cooling  schedule  directly  affects  the  quality  and  cost  of 
the  resulting  product.  For  example,  in  the  formation  of  crystal  from  a  liquid,  the  cooling 
schedule  controls  the  resulting  arrangement  of  atoms  in  the  resulting  solid  crystal.  If 
the  transition  of  physical  states  occurs  too  rapidly,  the  atoms  will  be  unable  to  adequately 
explore  local  rearrangements  resulting  in  the  atoms  being  trapped  in  the  configuration  they 
had  in  the  liquid  state.  Thus,  the  slower  the  cooling  schedule,  the  more  likely  that  the 
atoms  will  find  the  ordering  that  has  the  lowest  global  energy.  On  the  other  hand,  if  the 
system  is  cooled  too  slowly,  the  cost  to  maintain  the  temperature  offsets  the  relative  value 
of  the  resulting  crystal. 

The  concept  of  annealing  was  applied  to  numerical  methods  to  minimize  the  resulting 
energy  of  a  system  by  including  thermodynamic  fluctuations  in  statistical  mechanical  sys¬ 
tems.  Metropolis  et  al.  [41]  proposed  the  use  of  a  state  probability  distribution  to  simulate 
a  system  at  a  fixed  temperature.  The  Boltzmann-Gibbs  energy  state  probability  distribu¬ 
tion  provides  the  relative  probability  of  seeing  a  system  in  a  state  with  an  energy  E  as: 
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General  Simulated  Annealing  Algorithm 

Initialization:  Choose  a  feasible  starting  point  xq  and  Tm in. 

Initialize  the  cooling  schedule  used  to  generate  the  temperature  T. 

Set  x  =  xq- 

1.  If  T  >  Tm in,  stop  and  return  x. 

2.  Randomly  select  a  neighbor  point  y. 

3.  If  (E[y]  <  E[x]),  set  x  =  y\ 

Otherwise,  set  x  =  y  with  probability  exp(—  SjdziEM 

4.  Generate  T  according  to  the  cooling  schedule  and  return  to  Step  1. 

Figure  2.3.  General  Simulated  Annealing  Algorithm 

P(E)=cTex  (2-8) 

where  ct  >  0  is  the  normalizing  constant,  q,  >  0  is  the  Boltzmann  constant,  and  T  is  the 
temperature  of  the  system. 

The  resulting  search  algorithm  is  presented  in  Figure  2.3,  for  a  fixed  temperature  T, 
where  system  states  are  represented  as  points.  The  use  of  the  Boltzmann-Gibbs  distribution 
in  the  iterate  selection  subproblem  provides  a  basis  for  the  probabilistic  acceptance  of  points 
which  do  not  satisfy  the  simple  decrease  condition.  After  a  large  number  of  iterations,  the 
system  eventually  reaches  an  equilibrium  state  governed  by  the  Boltzmann-Gibbs  energy 
distribution  (Spall  [54,  p.  210]). 

Formally  introduced  as  an  optimization  method  by  Kirkpatrick  et  al.  in  [30] ,  simulated 
annealing  was  developed  as  a  method  to  find  a  good  routing  of  wires  on  a  circuit  board. 
By  varying  the  temperature  T  according  to  a  user-defined  cooling  schedule,  the  Metropolis 
algorithm  was  extended  to  a  general  optimization  technique.  When  the  temperature  is 
high,  the  algorithm  is  similar  to  a  random  walk  and  thus  can  globally  explore  the  feasible 
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region.  However,  as  the  algorithm  progresses,  the  temperature  reduction  restricts  the 
acceptance  iterates  that  do  not  strictly  improve  the  quality  of  solution.  Therefore,  when 
the  algorithm  is  applied  to  functions  with  many  local  optima,  this  modified  Monte  Carlo 
search  technique  has  the  ability  to  probabilistically  find  the  global  optimum. 

Two  key  elements  in  the  implementation  of  simulated  annealing  are  the  selection  of 
candidate  iterates  and  the  cooling  schedule.  Selection  of  candidates  iterate  can  be  done 
randomly  as  shown  in  Figure  2.3;  however,  simulated  annealing  can  perform  more  efficiently 
if  the  candidate  iterate  selection  is  based  on  an  algorithm  using  available  problem  informa¬ 
tion.  In  such  an  implementation,  the  Boltzmann-Gibbs  distribution  would  be  used  to  allow 
the  algorithm  to  make  alternative  selections  for  the  best  iterate.  The  selection  of  the  cool¬ 
ing  schedule  represents  a  more  complex  issue.  As  discussed  in  the  formation  of  crystals,  if 
the  cooling  rate  is  too  high,  the  system  will  not  be  able  to  fully  explore  the  feasible  region 
and  may  not  find  the  global  optimum.  If  the  cooling  rate  is  too  low,  the  computational 
efficiency  of  the  algorithm  may  be  reduced.  The  selection  of  an  appropriate  cooling  sched¬ 
ule  and  search  algorithm  are  essential  to  the  convergence  of  simulated  annealing  to  a  global 
optimum;  however,  in  practice,  these  completely  problem-dependent  tuning  parameters  are 
not  known.  Therefore,  simulated  annealing  requires  problem  information  that  is  usually 
not  available  from  a  "black  box"  simulation  to  ensure  convergence  to  a  stationary  point. 

2.3.2  Evolutionary  Algorithms 

Evolutionary  algorithms  comprise  a  large  set  of  numerical  methods  that  uses  adaptive 
(or  evolutionary)  processes  to  seek  out  optimal  solutions.  One  main  distinction  of  evolu¬ 
tionary  algorithms  is  that  they  maintain  a  population  of  current  "best"  iterates  rather  than 
a  single  one,  thus  enabling  them  to  explore  many  points  in  parallel  instead  of  the  common 
serial  approach. 
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2.3. 2.1  Genetic  Algorithms.  Perhaps  the  most  popular  of  the  evolutionary 
algorithms  class  are  genetic  algorithms  (GA).  Originally  developed  by  Holland  [26]  to 
represent  complex  adaptive  processes,  ranging  from  biological  systems  to  economies  to 
political  systems,  genetic  algorithms  were  based  on  principles  of  genetic  variations  and 
natural  selection  to  mimic  the  evolution  of  a  population  over  generations.  The  use  of 
generations  allows  the  whole  population  of  genetic  algorithms  to  be  updated  together;  thus, 
the  state  of  a  GA  is  represented  by  all  the  individuals  of  the  current  population. 

Using  biological  theories  of  competing  traits,  genetic  algorithms  can  directly  maintain 
control  between  trait  dominance  and  randomness  in  the  population  genes  (which  from  an 
optimization  context  is  analogous  to  the  search  for  local  versus  global  optima).  Because 
genetic  algorithms  are  easily  adapted  to  optimization  problems  and  are  able  to  rapidly 
locate  good  solutions,  genetic  algorithms  have  been  accepted  as  an  optimization  tool. 

Since  genetic  algorithms  have  a  basis  in  natural  selection,  the  terminology  uses  many 
biological  names.  For  example,  each  iterate  is  referred  to  as  an  individual  whose  para¬ 
meter  values  are  called  chromosomes.  Although  the  chromosomes  have  traditionally  been 
represented  as  binary  strings,  requiring  standard  bit  (binary  digit)  coding,  extensions  of 
genetic  algorithms  to  multiple  character  coding  is  an  area  of  active  research  (Spall  [54,  p. 
237]).  The  chromosomes  of  an  individual  are  used  to  compute  its  fitness  value,  which  is 
analogous  to  the  objective  function  value  of  an  iterate.  The  resulting  fitness  value  can 
then  be  used  to  determine  both  an  individual’s  inclusion  in  subsequent  generations  and 
matchings  for  offspring  production  (representing  the  "survival  of  the  fittest"  principle  of 
natural  selection).  One  distinction  that  GA  makes  from  natural  selection  theory  is  the 
inclusion  of  an  "elitism"  strategy.  De  Jong,  a  doctoral  student  of  Holland’s  who  systemat¬ 
ically  studied  the  tuning  of  the  genetic  algorithm  parameters,  first  described  an  "elitism" 
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strategy,  which  allows  for  asexual  reproduction  of  individuals  based  on  fitness  values  [18]. 
The  " elite"  individuals  can  be  directly  appended  to  the  next  generation  and  given  priority 
during  crossover  matchings,  thus  ensuring  the  best  chromosomes  are  preserved. 

Genetic  operators  of  crossover  and  mutation  are  then  applied  probabilistically  to  the 
current  population  to  produce  a  new  generation  of  individuals.  During  the  crossover  (or 
recombination)  operation,  the  chromosomes  of  the  individuals  to  be  reproduced,  referred 
to  as  the  parents ,  are  used  in  a  recombination  process  to  determine  the  chromosomes  of 
the  offspring.  The  usefulness  of  the  recombination  process  depends  on  the  nature  of  the 
fitness  function.  If  the  chromosomes  operate  independently  of  one  another  in  the  fitness 
evaluation,  then  crossover  can  quickly  improve  the  offspring’s  fitness  value.  However,  if 
the  chromosomes  are  all  intimately  linked,  the  algorithm  can  skip  the  crossover  operation 
and  rely  on  mutation  to  seek  fitness  improvement.  The  mutation  operation  of  genetic 
algorithms  actively  searches  for  improvements  using  the  current  population.  In  the  basic 
formulation,  the  mutation  operation  randomly  changes  the  chromosomes;  however,  just  as 
with  simulated  annealing,  genetic  algorithms  can  perform  more  efficiently  if  the  mutation  is 
based  on  an  algorithm  using  available  problem  information.  The  resulting  general  genetic 
algorithm  is  given  in  Figure  2.4. 

Although  there  are  several  classes  of  evolutionary  algorithms,  genetic  algorithms  have 
been  the  most  widely  used.  As  an  example  of  other  evolutionary  algorithms,  evolution 
strategies  are  reviewed  in  the  next  subsection. 

2. 3. 2. 2  Evolution  Strategies.  Evolution  strategies  are  specifically  designed 

for  constrained  continuous  variable  optimization.  Instead  of  extending  genetic  algorithms 
by  changing  the  discrete  chromosomal  coding,  evolutionary  strategies  were  developed  in¬ 
dependently  by  Rechenberg  [50]  to  use  the  natural  (continuous)  values  of  the  individual’s 
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General  Genetic  Algorithm 

Initialization:  Choose  an  initial  population  of  N  random  individuals. 

Evaluate  the  fitness  function  for  each  individual. 

Set  Ne  =  0  if  elitism  strategy  is  used,  otherwise  select  0  <  Ne  <  N. 

1.  Select  with  replacement  N  —  Ne  parents  from  the  full  population  (including  the  Ne  elite 
individuals) .  The  selection  of  parents  should  employ  a  strategy  to  ensure  individuals  with 
higher  fitness  value  are  selected  more  often. 

2.  (Crossover)  Apply  a  recombination  process  to  the  chromosomes  of  the  parents  to 
produce  offspring. 

3.  While  retaining  the  Ne  elite  individuals,  replace  the  previous  generation  with  the 
offspring  generated  in  step  2. 

4.  (Mutation)  Apply  an  appropriate  algorithm  to  change  the  chromosomes  of  the  population. 

5.  Compute  the  fitness  values  for  the  new  population.  Terminate  the  algorithm  is  the 
stopping  criterion  is  met  or  if  the  budget  of  fitness  function  evaluations  is  reached; 
otherwise,  return  to  step  1. 

Figure  2.4.  General  Genetic  Algorithm  (adapted  from  Spall  [54]) 

parameters.  In  Rechenberg’s  original  work,  the  population  was  limited  to  a  size  of  one  but 
was  extended  with  the  introduction  of  crossover  to  populations  with  more  than  one  indi¬ 
vidual.  Other  differences  between  genetic  algorithms  and  evolution  strategies  are  related 
to  the  basic  genetic  operations  used  to  generate  offspring  and  population  sets.  Before  indi¬ 
viduals  are  selected  for  inclusion  in  the  current  population,  offspring  are  generated  by  mu¬ 
tating  the  parent  chromosomes  according  to  a  zero  mean  probability  distribution  (usually  a 
normal  distribution)  with  a  variance  that  is  either  user-defined  or  based  on  the  covariance 
of  the  parent  chromosomes.  Offspring  that  do  not  meet  the  constraints  of  the  problem  are 
simply  discarded.  The  resulting  valid  offspring  compete  either  with  or  without  the  par¬ 
ents  for  inclusion  in  the  next  generation.  The  resulting  evolutionary  strategy  algorithm  is 
given  in  Figure  2.5. 


24 


General  Evolution  Strategy  Algorithm 

Initialization:  Choose  an  initial  population  of  N  random  individuals. 

Evaluate  the  fitness  function  for  each  individual. 

1.  (Mutation  and  Crossover)  Apply  an  appropriate  distribution  to  generate  A  offspring  from 
the  current  population  that  satisfy  the  constraints  of  the  problem. 

2.  Evaluate  the  fitness  function  for  each  offspring  generated  in  step  1. 

3.  If  parents  are  to  be  included  in  selection,  select  the  N  best  individuals  from  the 
combined  population  of  the  A  offspring  and  the  N  parent;  otherwise,  select  the  N  best 
individuals  from  the  population  of  A  >  N  offspring  only. 

4.  Replace  the  previous  generation  with  the  N  individuals  selected  in  step  3. 

5.  Terminate  the  algorithm  if  the  stopping  criterion  is  met  or  if  the  budget  of  fitness 
function  evaluations  is  reached;  otherwise,  return  to  step  1. 

Figure  2.5.  General  Evolution  Strategies  Algorithm  (adapted  from  Spall  [54]) 

Just  as  with  simulated  annealing,  the  selection  of  appropriate  algorithmic  parameters 
and  search  algorithm  is  essential  to  the  convergence  of  evolutionary  algorithms  to  a  global 
optimum.  Although  evolutionary  algorithms,  particularly  genetic  algorithms,  have  shown 
success  in  many  diverse  types  of  applications,  they  may  perform  significantly  poorer  than 
even  random  searches.  The  general  lack  of  understanding  in  the  performance  of  evolution¬ 
ary  algorithms  has  motivated  a  large  portion  of  the  current  research  to  characterize  what 
it  is  that  makes  certain  problems  hard  for  these  algorithms. 

2.3.3  Suitability  of  Search  Heuristics 

Through  use  of  embedded  stochastic  processes,  many  of  the  search  heuristics  can 
avoid  local  optima.  The  balance  between  generality  and  performance  properties  for  search 
heuristic  methods  provides  a  good  example  of  the  no  free  lunch  (NFL)  theorem.  According 
to  the  NFL  theorem  by  Wolpert  and  Macready  [63],  if  an  algorithm  does  particularly 
well  on  average  over  one  class  of  problems  then  it  must  do  worse  on  average  over  the 
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remaining  problems.  Thus,  when  a  comparison  of  performance  is  conducted  on  a  class  of 
problems  between  a  general  search  heuristic  and  a  tailored  algorithm,  the  tailored  algorithm 
frequently  outperforms  the  search  heuristic.  In  the  case  of  search  heuristics,  their  ability 
to  perform  generally  well  over  a  diverse  set  of  problems  infers  that  they  may  not  perform 
well  on  individual  problem  classes. 

When  considering  the  development  of  an  algorithm  to  solve  simulation  optimization 
problems,  the  underlying  algorithm  needs  to  be  general  enough  to  be  applicable  to  a  variety 
of  simulations  yet  narrow  enough  in  scope  to  avoid  issues  the  NFL  theorem  presents.  Ad¬ 
ditionally,  the  underlying  algorithm’s  convergence  theory  should  be  robust  enough  to  avoid 
overtailoring  the  algorithm,  since  simulations  are  often  "black  box"  systems.  Because  the 
convergence  theory  and  algorithmic  performance  are  generally  unreliable,  search  heuristics 
are  not  used  in  this  research. 

2.4  Generalized  Pattern  Search  (GPS)  Methods 

As  shown  in  Section  2.1,  the  use  of  the  general  optimization  algorithm  in  Figure  2.1 
does  not  ensure  convergence  to  a  stationary  point  of  the  optimization  problem  given  in 
Equation  (2.1).  Pattern  search  methods,  a  subclass  of  direct  search  methods,  have  been 
proven  convergent  to  a  stationary  point  by  the  use  of  a  conceptual  mesh.  Additionally, 
pattern  search  methods  have  been  modified  for  applicability  to  MVP  problems.  In  this 
section,  the  mesh  conditions  placed  on  the  pattern  search  methods  and  adaptations  of  GPS 
methods  are  discussed  in  relation  to  the  underlying  convergence  theory. 

2.4-1  Pattern  Search  Methods 

Before  1997,  the  direct  search  methods  were  generally  viewed  as  search  heuristics 
because  they  lacked  convergence  theory.  Torczon  [58]  made  a  significant  contribution  by 
not  only  demonstrating  that  many  of  the  seemingly  diverse  direct  search  methods  members 
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could  be  unified  under  a  generalized  subclass,  designated  as  pattern  search  methods,  but 
under  reasonable  assumptions,  this  new  subclass  of  methods  was  proven  convergent  to  a 
stationary  point.  Specifically,  if  the  objective  function  /  is  continuously  differentiable  in  a 
neighborhood  of  the  level  set, 


L{x o)  =  { x  :  f(x)  <  f(x o)}  (2.9) 

then  a  subsequence  of  GPS  iterates  will  globally  converge  to  a  point  x,  satisfying 

V/(x)  =  0;  that  is,  liminf*, _ >0O  ||V/(xfc)||  =  0.  Thus,  without  using  any  derivative 

information,  pattern  search  is  globally  convergent  to  a  stationary  point.  In  order  to  unify 
the  members  of  the  pattern  search  subclass,  Torczon  [58]  imposed  requirements  on  both  the 
location  of  candidate  iterates  and  the  update  operation  used  in  the  direct  search  methods. 
The  candidate  iterates  eligible  to  replace  the  current  incumbent  are  required  to  he  on  a 
conceptual  lattice  (or  mesh),  which  is  determined  by  the  direction  set  and  a  step  length 
parameter  maintained  by  the  algorithm. 

Once  the  set  of  candidate  points  is  established,  each  point  is  evaluated  by  the  objective 
function  and  compared  against  the  incumbent  .  If  improvement  is  found,  then  the  successful 
candidate  iterate  is  accepted  as  the  new  incumbent  and  step  size  is  either  maintained  or 
increased  (which  retains  or  coarsens  the  mesh,  respectively);  otherwise,  the  step  size  is 
decreased  (which  refines  the  mesh)  and  the  members  of  the  new  candidate  iterate  set  are 
evaluated  for  success.  It  is  important  to  note  that  the  use  of  the  mesh  makes  satisfaction 
of  a  sufficient  decrease  condition,  Equation  (2.3),  automatic,  thus  only  simple  decrease  is 
required  (Kolda  et  al.  [31]).  By  inspection,  the  conditions  of  convergence  for  the  pattern 
search  methods  are  analogous  to  the  conditions  required  in  Section  2.1.  In  particular, 
the  use  of  a  positive  spanning  set  (described  in  Section  3.1)  satisfies  the  angle  condition 
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and  the  updating  operation  of  the  step  size  ensures  that  the  steps  are  neither  too  long  nor 
too  short,  which  was  previously  ensured  by  the  use  of  sufficient  decrease  and  either  the 
curvature  condition  or  backtracking. 

Lewis  and  Torczon  [34]  further  generalized  the  algorithm  applying  the  theory  of  posi¬ 
tive  linear  dependence  presented  by  Davis  [17].  The  direction  set  is  selected  in  the  gener¬ 
alized  algorithm  so  that  it  forms  a  positive  spanning  set  for  the  domain  of  the  optimization 
function.  A  set  of  vectors  {ai,...,ap}  positively  spans  Mn  if  any  vector  x  E  Mn  can  be 
written  as  a  nonnegative  linear  combination  of  the  vectors  in  the  set;  i.e., 

x  =  Ai<2i  -T  •••  T  A pCip  (2.10) 

where  Aj  >  0  for  Vi  =  1  Through  this  construction  of  the  candidate  iterate  set, 

whenever  V/(x)  ^  0,  where  x  is  the  current  incumbent,  a  descent  direction  of  the  objective 
function  can  be  captured  in  at  least  one  member  of  the  candidate  iterate  set.  The  set 
{ai, ...,  ap}  is  called  positively  dependent  if  one  of  the  cii's  is  a  nonnegative  combination  of 
the  others;  otherwise,  the  set  is  positively  independent.  Since  a  positive  basis  is  a  positively 
independent  set  whose  positive  span  is  Rn,  a  positive  basis  is  the  smallest  proper  subset 
of  a  positive  spanning  set  that  still  positively  spans  Rn.  Davis  showed  that  any  positive 
basis  in  Rn  contains  between  n  +  1  and  2 n  elements  (referred  to  as  a  minimal  and  maximal 
set,  respectively);  therefore,  Lewis  and  Torczon  bound  the  worst  case  number  of  members 
in  the  candidate  set  per  iteration  to  n+  1  iterates.  As  noted  in  Dolan  et  al.  [21],  "the  key 
to  the  global  analysis  is  the  notion  of  having  searched  in  a  sufficient  number  of  directions 
from  the  current  iterate  to  guarantee  that  we  have  not  overlooked  a  potential  direction  of 
descent."  Thus,  the  use  of  a  positive  basis  enables  the  algorithm  to  use  the  fewest  number 
of  function  evaluations  without  the  accidental  avoidance  of  descent  directions. 
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Pattern  search  methods  unified  many  of  the  most  popular  direct  search  methods, 
including  the  coordinate  search  with  fixed  step  lengths,  the  original  Hooke-Jeeves  method 
[28],  EVOP  with  factorial  designs  (Box  [11]),  and  multidirectional  search  (Dennis  and 
Torczon  [20]).  However,  it  should  be  noted  that  one  of  the  first  and  most  popular  methods, 
the  original  Nelder-Mead  (downhill  simplex)  method  [44],  does  not  fall  into  the  class  of 
pattern  search  methods.  The  original  Nelder-Mead  method  is  based  on  the  use  of  a 
simplex,  which  is  a  geometrical  figure  that  has  one  more  vertex  than  dimension,  to  capture 
first-order  information.  The  method  uses  a  line  search  along  the  line  passing  through  the 
centroid  and  the  vertex  that  has  the  currently  least  favorable  objective  value.  The  line 
search  attempts  to  locate  a  new  point  for  the  vertex  that  produces  an  objective  function 
value  that  is  strictly  better  than  that  of  the  second  least  favorable  vertex.  If  such  a  point  is 
found,  the  vertex  is  moved  to  the  new  point  (thus,  deforming  the  shape  of  the  simplex)  and 
the  method  repeats  with  the  new  least  favorable  vertex.  If  the  line  search  is  unsuccessful 
in  locating  a  new  point,  all  the  vertices  are  contracted  toward  the  vertex  with  the  most 
favorable  objective  function  value.  The  result  of  this  relatively  simplistic  method  is  a 
simplex  that  is  able  to  quickly  adapt  to  the  local  topology  of  the  function.  However,  as 
detailed  by  Kolda  et  al.  [31],  the  unmodified  version  of  this  method  cannot  be  considered  a 
GPS  method  because  the  selection  of  the  candidate  set  does  not  ensure  an  existing  descent 
direction  is  located  and  the  update  operation  is  based  on  comparisons  to  the  second  least 
favorable  vertex  instead  of  the  current  incumbent  value. 

2.4-2  Extensions  to  Constrained  Problems 

Lewis  and  Torczon  ( [35]  and  [36] )  showed  that  placing  an  additional  requirement  on  the 
candidate  iterate  set  enables  the  convergence  theory  established  in  the  unconstrained  GPS 
algorithm  to  be  extended  to  bound  and  linearly  constrained  optimization  with  continuously 
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differentiable  objective  functions.  Specifically,  a  subsequence  of  GPS  iterates  converges  to 
a  point  x  satisfying  the  first-order  necessary  condition  for  optimality;  namely, 

V  f(x)T(x— x)  >  0  for  all  feasible  x.  The  additional  requirement  is  that  the  set  of  directions 
that  defines  the  candidate  iterate  set  must  be  chosen  to  conform  to  the  geometric  boundary 
of  the  active  constraints.  In  an  effort  to  keep  all  the  iterates  of  the  GPS  methods  feasible 
(thereby  producing  feasible  point  methods)  Lewis  and  Torczon  required  that  the  initial 
iterate  be  a  feasible  point  and  used  an  exact  penalization  approach ;  namely, 


min.F(x) 


(2.11) 


,  ev  N,  f  f(x),  if  x  €  D 
where  Fix)  =  <  ,  '  . 

v  '  1  Too,  otherwise 

and  !!  =  {i  £  Rn  |  l  <  x  <  u}  or  0  =  {x  G  Mn  |  l  <  Ax  <  u }  for  bound  and  linearly 
constrained  optimization  problems,  respectively. 

Audet  and  Dennis  [6]  presented  the  unification  of  the  unconstrained,  bound,  and  lin¬ 
early  constrained  versions  of  GPS  methods  by  applying  a  barrier  function  directly  to  the 
function  instead  of  Lewis  and  Torczon’s  approach  of  making  the  use  of  an  exact  penaliza¬ 
tion  approach  dependent  on  the  selection  of  GPS  algorithm.  The  barrier  approach  simply 
replaces  the  optimization  function  of  the  original  problem  with  fa,(x ),  which  is  given  by 


fa(x)  = 


f(x),  if  x  €  D 
Too,  otherwise 


(2.12) 


where  D  is  the  feasible  region.  By  considering  all  the  previous  GPS  algorithms  as  the 
same  algorithm  with  the  barrier  function  used  as  the  optimization  function,  the  analysis 
and  theoretical  results  are  applicable  to  all  the  previous  methods.  Similar  to  the  previous 
work,  the  step  size  ( mesh  size )  parameter  is  updated  conditionally  on  the  replacement  of 
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the  incumbent  and  the  candidate  iterates  are  required  to  lie  on  the  mesh  Mk,  which  is  given 


by 


Mk  =  {xk  +  AkDz  :  z  <e7)^}  (2.13) 

where  xk  G  Rn<?  is  the  current  iterate,  D  G  RnCxlI?l  is  a  positive  spanning  set,  and  Ak  G  R 
is  the  mesh  size  parameter.  Only  the  following  additional  rule  needs  to  be  enforced  to 
assure  convergence  to  a  stationary  point:  each  direction  dj  6  D,  for  j  =  1, ...,  \D\,  must  be 
of  the  form  d  =  Gz,  where  z  €  ZnCand  G  G  ZnCxn<?  is  a  nonsingular  generating  matrix. 

In  order  to  allow  the  user  the  flexibility  to  incorporate  search  heuristics  (see  Booker  et 
al.  [9]  and  [10]),  Audet  and  Dennis  ([6])  explicitly  separate  out  a  SEARCH  step  to  comple¬ 
ment  the  POLL  step,  which  is  different  but  equivalent  to  the  work  of  Torczon  ([58]).  During 
the  POLL  step,  the  members  of  the  poll  set  (candidate  set)  are  evaluated  and  compared  to 
the  incumbent  for  improvement.  The  poll  set  Pk  is  composed  of  mesh  points  neighboring 
the  current  incumbent  xk  in  the  directions  of  the  columns  of  Dk  C  D\  thus  the  poll  set  can 
be  expressed  as 


Pk  =  {xk  +  &kd  :  d  G  Dk}  (2.14) 

where  Dk  is  the  current  positive  spanning  set.  Thus,  there  is  great  freedom  in  choosing 
the  directions  of  the  positive  spanning  set. 

Since  the  convergence  results  do  not  depend  on  the  SEARCH  step,  the  user  is  provided 
this  optional  step  to  employ  a  finite  heuristic  to  accelerate  the  convergence  of  the  algorithm. 
If  either  the  SEARCH  or  POLL  step  produces  an  improved  mesh  point ,  then  the  current 
iteration  can  end,  the  mesh  size  parameter  is  kept  the  same  or  is  increased,  the  improved 
mesh  point  becomes  the  new  incumbent,  and  the  process  is  reiterated.  However,  if  both  the 
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General  Pattern  Search  Algorithm 

Initialization:  Choose  a  feasible  starting  point  xo  such  that  fn(x o)  <  oo. 

Let  D  be  a  positive  spanning  set. 

Let  the  Mq  c  O  be  the  mesh  defined  by  mesh  size  parameter  Ao  >  0  and  Do  £  D 
Set  the  iteration  counter  k  to  0. 

For  k  =  0, 1, ... 

1.  SEARCH  Step  (Optional):  Employ  some  finite  strategy  seeking  an  improved  mesh  point; 

1. e.,  xk+i  £  Mk  such  that  fn(xk+ i)  <  fn(xk). 

2.  POLL  Step:  If  the  SEARCH  step  does  not  find  an  improved  mesh  point,  evaluate  /o  at 
the  points  in  POLL  set  Pk  until  an  improved  mesh  point  xk+x  is  found  (or  until  done). 

3.  Update:  If  SEARCH  or  POLL  finds  an  improved  mesh  point  ,  then  update  xk+i ,  and  set 
A/c+i  =  Tmk  Ak  >  Ak  where  r  >  1  is  a  rational  number  that  remains  constant  over  all 
iterations,  and  the  integer  mk  satisfies  0  <  mk  <  mmax  for  some  fixed  integer  rnmax  >  0; 

Otherwise,  set  xk+\  =  xk,  and  set  A^+i  =  rmfc  Ak  <  Ak  where  r  is  a  rational  number  that 
remains  constant  over  all  iterations,  rmfc  £  (0, 1),  and  the  integer  mk  satisfies  mmin  <  mk  <  —  1 
for  some  fixed  integer  mmin. 

4.  Terminate  the  algorithm  if  the  stopping  criterion  is  met  or  if  the  budget  of  function 
evaluations  is  reached;  otherwise,  return  to  step  1. 

Figure  2.6.  General  Pattern  Search  Algorithm  (adapted  from  Audet  and  Dennis  [6]) 
SEARCH  or  POLL  step  fail  to  produce  an  improvement,  then  the  incumbent  is  declared  to  be 
a  mesh  local  optimizer ,  the  mesh  size  parameter  is  decreased,  and  the  process  is  reiterated. 

Although  not  directly  covered  in  this  research,  pattern  search  methods  have  also  been 
extended  to  problems  with  nonlinear  constraints.  Lewis  and  Torczon  [37]  apply  GPS  for 
bound  constraints  to  an  augmented  Lagrangian  (see  Conn  et  al.  [13])  formulated  from  the 
constraints  of  the  problem.  By  iteratively  reducing  the  terminating  mesh  size  parame¬ 
ter  and  using  the  results  of  one  iteration  to  provide  parameter  estimates  for  the  next,  the 
algorithm  converges  to  a  stationary  point  under  the  assumption  of  twice  continuous  dif¬ 
ferentiability  of  the  objective  and  constraint  functions.  The  drawback  in  practice  is  that 
performance  is  driven  by  a  problem-dependent  penalty  parameter. 
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Audet  and  Dennis  [8]  alternatively  extend  GPS  for  nonlinear  constraints  by  incor¬ 
porating  a  filter.  The  filter  algorithm  was  first  introduced  by  Fletcher  and  Leyffer  [23] 
as  a  method  to  globalize  sequential  quadratic  programming  (SQP)  and  sequential  linear 
programming  (SLP)  without  requiring  the  user  to  specify  the  weight  parameters  used  in 
an  alternative  merit  function  approach.  A  filter  algorithm  introduces  a  constraint  viola¬ 
tion  function  that  aggregates  constraint  violations  for  sampled  infeasible  points  and  uses 
a  bi-objective  approach  to  the  optimization  problem,  in  which  the  filter  accepts  an  iterate 
that  improves  either  the  objective  function  or  the  aggregate  constraint  violation  function. 
While  convergence  to  a  point  satisfying  first  order  optimality  conditions  is  not  proven  by 
the  use  of  the  resulting  algorithm  (since  the  convergence  results  that  are  guaranteed  de¬ 
pend  strongly  on  the  set  of  directions  used  in  the  POLL  step),  it  is  demonstrated  that  as 
a  richer  set  of  directions  are  used,  the  likelihood  of  achieving  a  point  satisfying  first-order 
optimality  conditions  increases.  The  pattern  search  filter  algorithm  reduces  to  the  basic 
GPS  algorithm  when  nonlinear  constraints  are  absent.  It  does  not  use  derivative  infor¬ 
mation,  requires  only  simple  decrease  in  function  value  for  a  mesh  parameter  update,  and 
does  not  require  any  constraint  qualifications  on  the  nonlinear  constraints. 

In  [7],  Audet  and  Dennis  introduce  the  Mesh  Adaptive  Direct  Search  (MADS)  algo¬ 
rithm  to  handle  general  constraints.  MADS  has  a  similar  structure  to  GPS;  however, 
MADS  introduces  a  poll  size  parameter  A)]  that  is  used  in  conjunction  with  the  mesh  size 
parameter  A™,  the  current  incumbent  Xk,  and  a  positive  spanning  matrix  D &  to  construct 
a  frame  (the  former  POLL  set,  renamed  to  align  with  the  Coope  and  Price  nomenclature  in 
[14])  used  during  the  POLL  step.  Although  both  the  poll  size  and  mesh  size  parameters  are 
used  to  construct  the  frame,  it  is  important  to  note  the  poll  size  parameter  does  not  directly 
influence  the  mesh.  This  distinction  allows  the  algorithm  to  choose  from  a  richer  pool  of 
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candidate  sets  for  the  direction  by  keeping  >  A™.  Thus,  by  letting  A™  go  to  zero  more 
rapidly  than  A^,  the  directions  in  D &  used  to  define  the  frame  may  then  be  selected  from 
a  larger  set  of  directions  that  become  dense  in  the  limit.  When  combined  with  the  barrier 
method  of  assigning  a  function  value  of  infinity  to  infeasible  iterates,  MADS  has  stronger 
convergence  properties  than  the  original  GPS  algorithm,  even  with  nonlinear  constraints. 

2.4-3  General  Pattern  Search  Extensions 

Materiel  to  this  research  was  the  development  of  a  framework  for  MVP  problems  with 
linear  constraints  by  Audet  and  Dennis  [5].  Having  already  defined  a  local  POLL  in  the 
GPS  algorithm,  the  Mixed  Variable  Generalized  Pattern  Search  (MGPS)  accommodates 
categorical  variables  by  dividing  polling  into  three  stages.  The  local  POLL  of  the  GPS 
algorithm  for  continuous  variables  is  augmented  with  polling  of  the  current  set  of  discrete 
neighbors  and  extended  polling  to  explore  around  promising  neighbors,  which  is  discussed 
in  Chapter  3.  Abramson  [2]  extended  MGPS  to  general  nonlinear  constraints  by  using 
a  filter.  This  is  pertinent  to  this  research  because  it  is  incorporated  into  the  NOMADrn 
MATLAB®  software  [1], 

It  is  also  important  to  mention  the  significant  contributions  made  in  the  application- 
related  research  for  GPS  algorithms.  Although  the  proximity  of  linear  constraints  requiring 
the  direction  set  D k  to  conform  to  the  geometry  of  the  boundary  is  presented  by  Lewis 
and  Torczon  [36],  the  algorithm  presented  for  constructing  the  required  directions  did  not 
directly  address  the  case  of  linearly  dependent  active  constraints.  Brezhneva  and  Dennis 
[12]  provide  a  general  algorithm  that  first  identifies  nonredundant  active  constraints  via 
a  projection  approach  and  the  solution  of  a  linear  program  and  then  constructs  sets  D *. 
taking  into  account  only  nonredundant  constraints. 
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To  further  characterize  the  quality  of  solutions  that  can  be  attained  by  GPS  methods, 
Abramson  [3]  is  able  to  prove  under  mild  conditions  some  limited  second-order  behavior 
results.  Even  without  first-order  derivatives,  under  mild  assumptions,  Abramson  proves 
that  GPS  cannot  converge  to  a  strict  local  maximizer,  and  with  the  additional  assumption 
that  the  objective  function  /  is  sufficiently  smooth,  that  the  use  of  a  2 n  orthonormal 
positive  basis  prevents  convergence  to  a  saddle  point  where  the  sum  of  the  eigenvalues  of 
the  Hessian  are  negative. 

Since  computational  efficiency  can  be  a  driving  factor  in  the  selection  of  an  optimiza¬ 
tion  algorithm  (particularly  for  direct  search  methods),  methods  for  increasing  efficiency 
by  using  available  information  are  usually  an  actively  investigated  research  topic  for  any 
computational  field.  Abramson  et  al.  [4]  demonstrate  that  the  use  of  any  derivative  in¬ 
formation  can  significantly  reduce  the  overall  number  of  function  evaluations  required  of 
a  GPS  algorithm  while  maintaining  the  known  convergence  properties.  The  method  uses 
any  available  gradient  information  to  prune  or  remove  directions  of  known  ascent  from  the 
positive  spanning  set  Dfc,  thus  producing  a  pruned  set  Df?.  The  pruned  set  D j?  is  then 
substituted  for  the  original  positive  spanning  set  in  the  standard  POLL  step  of  the  GPS  al¬ 
gorithm.  As  a  result,  even  rather  rough  approximations  to  the  gradient  (on  the  mesh)  are 
shown  sufficient  to  reduce  the  poll  set  to  a  singleton,  thus  requiring  only  a  single  function 
evaluation  at  each  poll  step. 

Computational  efficiency  has  also  been  shown  to  be  improved  by  using  gradient  in¬ 
formation  calculated  from  a  simplex  of  previously  stored  iterate  responses  (Custodio  and 
Vicente  [16]).  In  its  most  basic  form,  the  simplex  gradient  can  be  used  to  control  poll 
ordering  in  an  effort  to  reduce  the  number  of  function  evaluations  within  a  POLL  step  be¬ 
fore  an  improved  mesh  point  is  located.  If  the  simplex  gradient  is  used  to  approximate 
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the  true  gradient,  pruning  can  again  lead  to  a  singleton  in  the  poll  set.  Furthermore,  the 
simplex  Hessian  can  be  constructed  and  used  to  generate  an  approximation  to  the  New¬ 
ton  direction.  Simplex  derivative  information  can  also  be  used  to  form  a  candidate  set  to 
be  evaluated  during  the  SEARCH  step  or  to  impose  a  sufficient  decrease  condition  on  the 
update  of  the  mesh  size  parameter.  Thus,  without  requiring  additional  function  evalua¬ 
tions,  the  use  of  simplex  derivatives  can  be  used  successfully  to  increase  the  computational 
efficiency  of  the  GPS  algorithm. 

Kolda  et  al.  [31]  introduced  the  class  of  generating  search  set  (GSS)  methods  that 
contains  not  only  pattern  search,  but  also  moving  grids  (algorithms  inspired  by  Coope  and 
Price  [14]  that  conditionally  enable  the  mesh  to  change  between  iterations),  and  methods 
that  directly  enforce  the  sufficient  decrease  condition  of  Equation  (2.5)  without  derivative 
information,  such  as  those  found  in  [39]  by  Lucidi  and  Sciandrone,  instead  of  using  the 
simple  decrease  of  Equation  (2.3)  with  mesh  size  controls.  In  order  to  prove  convergence, 
the  GSS  methods  are  unified  under  three  principles  (which  are  the  same  requirements  of 

the  derivative-based  methods  discussed  in  Section  2.2): 

1.  the  algorithm  must  have  a  search  direction  that  is  a  descent  direction,  i.e.  a  search 
direction  p *.  where  p ^  V/(xfc)  <  0 

2.  GSS  methods  must  avoid  poor  search  directions,  i.e.  search  directions  must  satisfy  the 
sufficient  descent  condition  of  Equation  (2.4) 

3.  GSS  methods  must  avoid  poor  choices  of  step  lengths;  i.e.  the  steps  taken  must  not  be 
too  long  or  too  short 

Principles  1  and  2  are  necessarily  satisfied  for  all  members  of  the  GSS  class  because  the 
use  of  generating  sets  that  positively  span  the  domain  of  the  objective  function  ensure  that 
at  least  one  direction  is  a  descent  direction  that  is  not  orthogonal  to  the  gradient.  Since 
each  of  the  GSS  methods  decreases  the  mesh  size  parameter  after  unsuccessful  iterations, 
this  is  equivalent  to  a  multidirectional  line  search  with  backtracking  which  will  ensure  the 
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steps  are  not  too  short.  As  a  result,  the  only  requirement  for  a  GSS  method  to  converge 
to  a  stationary  point  is  to  have  a  means  to  avoid  choices  of  step  lengths  that  are  too  long. 
In  order  to  allow  for  the  inclusion  of  various  algorithms  within  the  GSS  class,  the  condition 
for  appropriate  step  lengths  is  generalized  as 

f(xk  +  Akpk)<f(xk)-p(Ak)  (2.15) 

where  the  forcing  function  p  is  used  to  change  the  inequality  between  the  simple  decrease 
condition  and  the  sufficient  decrease  condition.  For  algorithms  that  use  the  mesh  update  to 
limit  the  maximum  length  of  the  step  size,  such  as  GPS  and  moving  grids,  only  the  simple 
decrease  condition  needs  to  be  fulfilled.  For  these  algorithms,  the  forcing  function  is  simply 
set  to  zero  to  produce  the  simple  decrease  condition  of  Equation  (2.3).  For  algorithms  that 
do  not  restrict  the  maximum  step  size  length,  the  sufficient  decrease  condition  is  created 
by  requiring  p  to  be  a  continuous,  monotonically  increasing  function  that  is  linear  as  Ak 
decreases  to  zero.  Kolda  et  al.  [31]  demonstrate  that  every  member  of  the  GSS  class  is 
able  to  satisfy  the  conditions,  thereby  proving  convergence  for  GSS  methods. 

2.4-4  Suitability  of  Generalized  Pattern  Search  (GPS)  Methods 

Generalized  Pattern  Search  methods  are  well-suited  for  use  in  the  development  of 
an  algorithm  to  solve  simulation  optimization  problems.  Since  they  are  a  subclass  of 
direct  search  methods  that  do  not  use  derivative  information,  GPS  methods  are  directly 
applicable  to  problems  where  derivative  information  is  unavailable  and  cannot  be  accurately 
calculated.  Additionally,  GPS  methods  have  been  proved  convergent  not  only  as  a  class 
itself,  but  also  as  a  member  of  the  superclass  of  GSS  methods.  The  flexibility  of  GPS 
methods  has  been  demonstrated  by  tailoring  to  allow  for  the  handling  of  different  types  of 
constraints  and  variables  (to  include  MVP  problems). 
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Although  all  members  of  the  GSS  class  are  proven  to  be  convergent  to  a  stationary 
point,  the  generality  of  the  problem  structure  and  level  of  development  are  important  factors 
in  the  selection  of  the  underlying  optimization  algorithm.  For  example,  the  moving  grid 
algorithm  presented  by  Coope  and  Price  allows  greater  flexibility  of  the  mesh  and  could 
be  extended  to  handle  categorical  variables,  but  their  algorithm  has  not  been  encoded 
for  general  use.  Since  NOMADnr  is  an  algorithm  that  can  handle  MVP  problems,  the 
underlying  MGPS  algorithm  of  NOMADnr  will  be  modified  in  this  research  to  develop  an 
algorithm  that  can  efficiently  solve  simulation-based  optimization  problems  of  "black  box" 
systems. 

2.5  Implementation  Considerations 

With  the  underlying  algorithm  selected,  the  adaptation  of  the  current  algorithm  to 
stochastic  simulations  and  computational  efficiency  improvements  may  be  discussed.  In 
this  section,  the  use  of  ranking  and  selection  procedures  and  surrogate  searches  will  be 
discussed  as  they  relate  to  efficiently  solving  simulation-based  optimization  problems.  It 
will  be  shown  that  the  key  concept  in  both  modifications  is  to  gather  as  much  additional 
problem  information  as  possible  without  requiring  a  large  number  of  additional  function 
evaluations. 

By  considering  the  stochastic  simulation  as  a  response  generator,  a  typical  approach 
in  reducing  the  sampling  error  of  the  expected  performance  is  to  replicate  the  design  vector 
a  sufficiently  large  number  of  times.  This  approach  is  justified  by  the  weak  law  of  large 
numbers,  since  the  sample  mean  will  equal  the  population  mean  as  the  number  of  samples 
goes  to  infinity.  In  the  context  of  a  simulation  optimization  algorithm,  the  total  number  of 
function  evaluations  is  traditionally  a  limiting  factor  and  thus  a  trade-off  must  usually  be 
made  between  improving  the  solution  and  controlling  the  sampling  error.  In  this  research, 
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this  balance  will  be  maintained  through  the  use  of  statistical  procedures  that  require  a 
minimal  number  of  function  evaluations  for  a  given  confidence  level,  thereby  preserving 
function  evaluations  for  iterate  improvement. 

Since  the  selected  algorithm  relies  on  comparisons  between  stochastic  responses,  a 
review  of  statistical  procedures  for  detecting  these  differences  is  warranted.  Consider  a  set 
of  k  candidate  design  vectors  {Xi, ...,  X&}  under  consideration  with  true  objective  function 
values  denoted  fi  =  f(Xi)  =  E[F(Xj,  ?/?)],  where  i  =  1, ...,  k.  The  candidate  with  the  zth 
best  true  objective  function  value  will  be  denoted  by  X[j],  with  associated  true  objective 
function  .  The  desired  procedure  is  one  that  guarantees  selection  of  X^j  with  a  user- 
specified  probability  of  at  least  1  —  a,  which  can  be  expressed  as 

Pr{select  X{1]\f[{]  -  f{1]  >  0,  i  =  2, ...,  k}  >  1  -  a  (2.16) 

In  the  case  where  there  are  only  two  candidate  design  vectors,  one  can  simply  use  an 
appropriate  pairwise  comparison  test  (t— test)  to  perform  the  selection  at  the  given  level  of 
significance.  However,  when  there  are  at  least  three  candidates,  multiple  comparison  tests 
may  be  required. 

When  attempting  to  determine  which  candidate  provides  the  best  true  objective  func¬ 
tion  value,  a  common  first  step  in  multiple  comparison  testing  is  to  first  determine  if  there 
is  a  statistically  significant  difference  between  the  candidates.  Assuming  that  the  noise 
in  each  of  the  responses  is  independently  and  identically  distributed  (i.i.d)  normally  with 
mean  zero,  the  Tukey-Kramer  method  is  a  generalization  of  the  pairwise  t— test  that  pro¬ 
duces  a  common  acceptance  interval  for  the  differences  between  sample  mean  responses  for 
each  member  pair  of  the  candidate  set.  Since  the  null  hypothesis  of  the  test  is  that  all  the 
candidates  have  the  same  mean,  there  is  enough  statistical  evidence  at  the  given  a  level 
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of  significance  to  state  that  at  least  one  candidate  has  a  different  mean  value  if  any  of  the 
k(k  —  l)/2  pairs  of  mean  differences  he  outside  of  the  acceptance  interval.  Although  this 
result  is  rather  broad,  it  may  provide  an  indication  of  which  candidate (s)  provide  a  better 
response. 

After  a  test  of  equal  means  has  been  performed,  multiple  comparison  testing  attempts 
to  find  the  candidate  that  provides  the  best  true  objective  function  value  by  eliminating 
inferior  candidates.  A  standard  method  to  perform  this  culling  of  candidates  is  through 
multiple  comparison  with  the  best  (MCB)  testing.  By  assuming  some  level  of  prior  in¬ 
formation  about  which  candidate  is  the  optimum,  there  are  only  k  —  1  pairs  of  candidates 
that  require  one-way  testing,  as  opposed  to  the  two-way  testing  of  k(k  —  l)/2  pairs  of  can¬ 
didates  in  the  Tukey-Kramer  method.  Therefore,  MCB  tests  can  provide  stronger  results 
than  those  of  the  Tukey-Kramer  method. 

The  MCB  test  can  be  constructed  with  a  null  hypothesis  in  which  all  the  candidates 
produce  responses  at  least  as  good  as  the  assumed  optimum.  Thus,  just  as  with  the  first 
multiple  comparison  test,  the  desire  is  to  reject  this  hypothesis.  For  the  assumed  optimum 
to  be  the  true  best,  the  difference  in  sample  mean  responses  between  the  optimum  and 
each  candidate  must  he  within  an  "appropriate"  acceptance  region,  which  is  calculated 
from  available  information.  Thus,  if  Ei  represents  the  event  that  the  difference  in  sample 
mean  responses  of  the  optimum  and  the  candidate  i  lies  within  the  acceptance  region  (that 
candidate  i  produces  a  response  at  least  as  favorable  as  the  optimum) ,  then  the  event  E  that 
the  optimum  is  in  the  overall  acceptance  region  (that  all  the  candidates  produce  a  response 
at  least  as  favorable  as  the  optimum)  is  given  by  E\  fl ...  fl  Ek.  Thus,  the  probability  that 
the  optimum  is  the  best  is  given  by 
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Pr{E}  =  Pr{Ei  n ...  fl  Ek} 


(2.17) 


Since  Equation  (2.16)  requires  that  PrlE}  >  1  —  a,  the  joint  probability  is  given  by 


Pr{Ein...nEfc}  >  1-a  (2.18) 

The  acceptance  regions  for  each  of  the  individual  candidates  can  be  calculated  by 
partitioning  the  level  of  confidence  a  among  the  candidates.  The  most  straightforward 
manner  is  through  the  Bonferroni  inequality,  which  produces  an  acceptance  region  bound  of 

Pr{Ei}  =  (l-a)/(k-l)  (2.19) 

for  each  candidate.  If  the  additional  assumption  that  each  of  the  differences  in  sample 
mean  responses  between  the  optimum  and  the  candidate  are  jointly  normally  distributed 
with  mean  zero  is  maintained,  then  the  acceptance  region  bound  can  be  improved  through 
the  use  of  the  Slepian  inequality  (see  Tong  [57,  p.8] ) .  The  resulting  acceptance  region 
bound  for  each  candidate  is  given  by 


Pr {Ei}  =  (1  -  a)1/!*-1)  (2.20) 

Once  the  individual  candidate  acceptance  regions  are  formed  from  the  derived  level  of  con¬ 
fidence  and  the  sample  variances,  the  difference  in  each  sample  mean  response  between  the 
optimum  and  the  candidate  is  tested  against  its  acceptance  bound.  If  the  null  hypothesis 
is  rejected  and  each  of  the  individual  acceptance  bounds  are  broken,  then  strong  evidence 
exists  that  the  assumed  optimum  is  the  true  best. 
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From  this  review,  it  is  apparent  that  the  type  of  test  used  for  guaranteed  correct 
selection  of  the  best  candidate  is  dependent  on  the  amount  of  information  available  and 
simplifying  assumptions  taken.  With  this  foundation,  the  development  of  ranking  and 
selection  procedures  and  how  they  apply  to  this  research  may  be  discussed.  Trosset  [60] 
demonstrated  that  although  optimization  in  the  presence  of  random  noise  is  more  difficult 
than  a  noise-free  case,  statistical  testing  for  iterate  selection  can  be  used  in  pattern  search 
methods  to  maintain  convergence  results.  However  to  do  so,  the  number  of  replications 
per  iteration  increases  faster  than  the  squared  reciprocal  of  the  mesh  size  parameter  as  the 
sequence  converges  (Trosset  [60]).  Sriver  [55]  established  an  algorithmic  framework  for  GPS 
methods  with  stochastic  responses  and  proposed  ranking  and  selection  (R&S)  methods  for 
the  statistical  testing  of  GPS  iterates.  Sriver  [55]  overcomes  the  problems  identified  by 
Trosset  [60]  through  the  use  of  an  indifference  zone. 

Ranking  and  selection  methods  are  slightly  different  from  multiple  comparison  tests. 
The  goal  of  MCBs  is  to  characterize  the  members  of  the  candidate  set;  whereas,  the  goal 
of  R&S  methods  is  to  actively  screen  the  k  candidates  to  find  a  subset  likely  to  contain  the 
optimum.  MCBs  are  generally  a  tool  of  inference,  where  the  confidence  intervals  provide 
information  on  how  close  the  systems  may  be  to  one  another.  The  resulting  confidence 
intervals  can  be  combined  with  other  factors  in  selecting  an  overall  best  candidate.  R&S 
methods  are  a  tool  for  finding  the  optimum  set  without  the  need  to  characterize  the  candi¬ 
date  set.  Since  the  focus  of  this  research  is  strictly  concerned  with  finding  improvement, 
R&S  methods  are  more  applicable. 

Proposed  by  Paulson  [47],  fully  sequential  R&S  procedures  with  indifference  zone  use 
a  triangular  continuation  region  to  determine  when  enough  information  has  been  gathered 
to  find  the  optimum.  Indifference  zone  procedures  differ  slightly  with  the  probability 
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Figure  2.7.  Triangular  Continuation  Region 

statement  in  Equation  (2.16)  by  the  use  of  an  indifference  parameter  5  >  0.  This  user- 
selected  parameter  is  essentially  the  smallest  "important"  difference  between  solutions, 
often  referred  to  as  the  practical  difference.  That  is,  points  with  function  values  within  <5 
of  each  other  are  considered  equivalent.  These  procedures  select  X^  with  a  user-specified 
probability  of  at  least  1  —  a  whenever  the  difference  is  worth  detecting,  which  can  be 
expressed  as 


Pr{ select  X[{]\ f[{[  -  f{1]  <5,i  =  2 >  1  -  a  (2.21) 

By  approximating  the  sum  of  differences  between  two  systems  as  Brownian  motion,  Paulson 
developed  a  triangular  continuation  region,  illustrated  in  Figure  2.7,  that  acts  in  part  as 
an  acceptance  region  of  the  above  comparative  tests.  The  procedure  iteratively  sums  the 
difference  between  the  two  systems  as  long  as  the  result  lies  within  the  triangular  region. 
The  sum  can  leave  the  triangular  region  in  three  ways.  If  the  sum  crosses  the  acceptance 
bound,  then  the  system  associated  with  the  null  hypothesis  is  declared  the  optimum.  If  the 
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sum  crosses  the  rejection  bound,  then  the  system  associated  with  the  alternative  hypothesis 
is  declared  the  optimum.  Otherwise,  if  the  sum  exits  without  triggering  a  bound,  the 
system  with  the  more  favorable  average  responses  is  declared  the  optimum.  Thus,  the 
triangular  region  indicates  when  one  system  is  clearly  superior  to  the  other,  resulting  in  a 
selection  of  a  candidate  that  satisfies  Equation  (2.21). 

The  original  R&S  procedure  has  undergone  modification  to  increase  efficiency  by  using 
tighter  bounds,  similar  to  the  use  of  the  Slepian  inequality  for  MCBs  acceptance  bound. 
Hartmann  [25]  improved  upon  Paulson’s  procedure  by  replacing  Boole’s  inequality  with  a 
geometric  inequality  and  using  a  Brownian  motion  bound  on  the  acceptance  region  instead 
of  a  large  deviation  bound  under  the  assumption  of  normal  and  common  variance  noise. 
Kim  and  Nelson  [29]  further  extend  Hartmann’s  work  by  the  handling  of  unequal  and 
unknown  variances.  Pichitlamken  and  Nelson  [48]  introduced  Sequential  Selection  with 
Memory  (SSM),  which  is  similar  to  the  Kim  and  Nelson  procedure  but  includes  previously 
sampled  responses  in  the  current  R&S  procedure.  Sriver  [55]  demonstrated  that  SSM  per¬ 
formed  well  with  GPS  methods  in  controlling  the  number  of  required  function  evaluations 
while  guaranteeing  the  proper  level  of  confidence  in  the  iterate  selection.  Through  the  use 
of  an  efficient  R&S  procedure,  the  MGPS  algorithm  within  NOMADrn  can  be  extended  to 
stochastic  simulation  problems  without  requiring  a  prohibitive  number  of  function  evalua¬ 
tions;  therefore,  SSM  is  used  in  this  research  to  control  the  selection  of  iterates. 

In  addition  to  modifying  the  current  NOMADrn  code  to  handle  stochastic  simulations, 
this  research  seeks  to  meet  the  goals  of  increasing  the  computational  efficiency  of  the  opti¬ 
mization  algorithm  and  performance  testing  of  appropriate  surrogates.  As  noted  by  Booker 
et  al.  [10],  surrogates  that  serve  as  approximations  of  an  expensive  simulation,  constructed 
by  interpolating  or  smoothing  known  values  of  the  objective,  can  be  used  to  reduce  the 
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number  of  function  evaluations  required  to  discover  a  point  that  produces  decrease.  Ad¬ 
ditionally,  if  the  response  surface  has  more  than  one  optimum,  the  use  of  surrogates  could 
lead  to  stationary  points  with  better  objective  function  values.  Therefore,  this  research  ex¬ 
amines  the  use  of  surrogates  for  both  local  and  global  searches,  which  is  discussed  in  more 
detail  in  Chapter  4.  In  particular,  Latin  hypercube  sampling  (LHS)  will  be  used  as  an 
initial  space-filling  global  search  while  the  local  search  will  be  aided  either  by  the  use  of  a 
parametric  model-fitting  approach  of  Kriging  or  a  nonparametric  fitting  method  of  kernel 
regression. 

2. 6  Summary 

After  reviewing  the  requirements  for  optimization  algorithms  to  prove  convergence  to  a 
stationary  point,  the  GPS  class  of  algorithms  are  shown  to  adhere  to  the  same  requirements 
for  convergence  as  the  gradient-based  methods.  However,  since  the  GPS  methods  are 
suitable  for  simulation  optimization  (only  requiring  simple  decrease  in  objective  function 
evaluation  by  the  use  of  a  mesh  and  having  been  adapted  to  mixed  variable  domains),  a 
GPS  methods  is  used  in  this  research.  The  next  chapter  presents  the  approach  for  using 
GPS  with  ranking  and  selection  procedures. 
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Chapter  3  -  Approach 

This  chapter  discusses  in  more  detail  the  fundamental  concepts  of  the  MGPS  algorithm 
and  the  ranking  and  selection  procedure  used  in  this  research.  First,  the  construction  of 
the  mesh  and  the  poll  set  of  pattern  search  methods  are  adapted  for  mixed  variable  domains 
in  a  way  that  maintains  convergence  properties.  Then,  a  method  for  handling  bound  and 
linear  constraints  within  MGPS  algorithm  is  described.  Finally,  the  proposed  algorithm 
modifications  are  presented  with  associated  implementation  considerations. 

3.1  Mesh  and  Poll  Set  Construction 

As  noted  in  Chapter  2,  the  MGPS  algorithm  is  related  to  was  developed  from  the 
GPS  algorithm  of  Audet  and  Dennis  [6];  thus  there  are  common  restrictions  imposed  on 
the  iterates  to  ensure  proven  convergence  to  a  stationary  point.  In  particular,  at  any 
iteration  k,  all  iterates  are  required  to  lie  on  the  mesh  M/-.  Although  the  mesh  is  not 
actually  constructed,  the  concept  of  the  mesh  is  essential  to  the  construction  of  the  POLL 
set  from  which  candidate  iterates  are  generated.  In  order  to  define  the  mesh  in  the  context 
of  MVP  problems,  first  recall  the  notation  presented  in  Equation  (1.3):  each  vector  igU 
can  be  denoted  as  x  =  ( xc ,  xd )  where  xc  €  are  the  continuous  variables  of  dimension 
nc  and  xd  €  Znd  are  the  discrete  variables  of  dimension  nd.  By  fixing  the  values  of  the 
integer  space  Zn<i,  a  discrete  plane  can  be  defined  as  a  section  of  the  mesh  where  only  the 
continuous  values  may  vary.  Additionally,  since  the  predefined  categorical  list  is  finite, 
each  discrete  plane  i,  i  =  1,  ax,  can  be  identified  uniquely. 

The  mesh  construction  is  formulated  to  provide  enough  generality  to  be  applicable 
to  MVP  problems,  yet  revert  back  to  the  single  discrete  plane  mesh  structure  for  GPS 
methods  given  in  Equation  (2.13).  In  order  to  guarantee  convergence  in  GPS  methods, 
the  set  of  directions  used  to  generate  the  POLL  set  must  be  sufficiently  rich  to  produce  a 
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descent  direction.  This  condition  is  met  by  requiring  the  direction  set  to  positively  span 
the  domain  of  the  objective  function  /.  The  following  definitions  introduced  by  Davis  [17] 
provide  the  foundation  for  the  use  of  positive  spanning  sets  within  GPS  methods: 

Definition  3.1  A  positive  combination  of  the  set  of  vectors  V  =  {vi}l=1  is  a  linear  com- 

r 

bination  ^  CiVi,  where  Cj  >  0,  i  =  1, 2, . . . ,  r. 
i=i 


Definition  3.2  A  finite  set  of  vectors  W  =  {vJi}l=1  forms  a  positive  spanning  set  for  Mn 
if  every  v  G  Mn  can  be  expressed  as  a  positive  combination  of  vectors  in  W.  The  set  of 
vectors  W  is  said  to  positively  span  Rn. 


Definition  3.3  A  positive  spanning  set  of  vectors  W  is  said  to  be  a  positive  basis  for  Mn 
if  no  proper  subset  of  W  positively  spans  Mn. 

Davis  [17]  proved  that  the  cardinality  of  any  positive  basis  in  Mn  is  between  n  +  1  (a 
minimal  set)  and  2 n  (a  maximal  set).  Examples  of  minimal  and  maximal  sets  in  matrix 
notation  can  be  quickly  generated  respectively  by  using  the  associated  matrices  [/;  — e]  and 
[I;  —I],  where  I  is  the  identity  matrix  and  e  is  the  vector  of  ones,  and  are  commonly  selected 
as  the  positive  bases.  The  key  purpose  in  using  positive  spanning  sets  in  GPS  is  derived 
from  the  theorem: 

Theorem  3.4  (Davis  [17])  A  set  D  positively  spans  Rn  if  and  only  if,  for  all  nonzero 
v  G  Rn,  vTd  >  0  for  some  d  G  D. 

Thus,  if  a  positive  spanning  set  D  is  constructed  at  a  point  x  where  the  gradient  V/(x) 
exists  and  is  nonzero,  then  by  Theorem  3.4  at  least  one  element  of  D  is  required  to  be  a 
descent  direction  (since  V f(x)Td  <  0  for  some  d  €  D  and  v  =  — V/(x)). 

For  each  discrete  plane  i  =  1, ...,  imax,  a  set  of  positive  spanning  directions  Dl  G 
x  I  D‘  is  constructed  by  forming  the  product 
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Dl  =  GiZi 


(3.1) 


where  Z%  €  RnCxlZ)!l  and  Gi  €  MnCxnC  is  a  nonsingular  matrix  for  the  discrete  plane.  (The 
notation  is  abused  here  and  throughout  this  document,  in  that  the  set  D1  is  represented  in 
Equation  (3.1)  as  a  matrix  whose  columns  are  elements  of  the  set.)  This  additional  restric¬ 
tion  is  required  for  convergence  and  is  discussed  in  more  detail  by  Torczon  [58] .  Although 
a  common  choice  for  Gi  is  the  identity  matrix,  Audet  and  Dennis  [5]  note  that  the  generat¬ 
ing  matrices  Gi  (and  thus  Dl)  that  define  the  discrete  plane  meshes  can  be  determined  as 
the  iteration  unfolds  as  long  as  only  a  finite  number  of  them  are  generated.  Therefore,  the 
use  of  Equation  (3.1)  in  restricting  the  selection  of  positive  spanning  directions  maintains 
the  original  flexibility  from  the  GPS  methods,  but  allows  for  MVP  problem  structures  and 
reduces  to  the  single  discrete  plane  version  used  in  GPS  methods  when  discrete  variables 
are  absent. 

The  mesh  is  formed  as  the  direct  product  of  Xd  with  the  union  of  the  local  meshes  for 
each  possible  categorical  combination  setting,  which  can  be  expressed  as 

Mk  =  Xd  x  I  JW{4  +  A kD*z  €  Xc  :  z  €  z|P (3.2) 

where  xk  €E  Mn<?  is  the  current  iterate  and  Ak  is  the  mesh  size  parameter.  The  POLL  set 
Pk  is  composed  of  the  continuous  neighbors  of  the  current  incumbent  xk  in  the  directions 
of  the  columns  of  Dlk,  thus  the  POLL  set  can  be  expressed  as 

Pk  =  {xk  +  A k(d,  0)  E  D  :  d  E  Dk}  (3.3) 
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where  the  current  iterate  Xk  E  fl  lies  in  the  discrete  plane  i,  Ak  G  R  is  the  mesh  size 
parameter,  and  D'k  C  Dl  is  the  current  positive  spanning  set  of  the  discrete  plane  i.  The 
notation  (d,  0)  indicates  that  only  the  continuous  variables  may  change;  thus, 
xk  +  A k(d,  0)  =  (x%  +  A fcd,  xf). 

3.2  Optimality  for  Mixed  Variable  Domains 

In  order  to  properly  apply  optimization  algorithms  for  MVP  problems,  some  basic 
conceptual  definitions  used  in  optimization  must  be  adapted  to  mixed  variable  domains. 
For  example,  the  notion  of  local  optimality  must  be  revised  to  take  into  account  the  discrete 
planes  illustrated  by  the  structure  of  the  mesh.  Because  continuous  and  (traditional) 
discrete  variables  can  be  represented  as  ordered  sets,  local  optimality  is  well-defined  in 
terms  of  local  neighborhoods;  however,  for  categorical  variables,  the  concept  of  a  local 
neighborhood  must  be  defined  by  the  user.  Abramson  [2]  defines  the  set  of  discrete 
neighbors  in  terms  of  a  set-valued  function  as  TV"  :  II  — ►  2n,  where  denotes  the  power 
set,  which  is  the  set  of  all  possible  subsets  of  fk  Thus,  y  E  TV(x),  means  that  y  is  a  discrete 
neighbor  of  x.  In  the  context  of  MVP  problems,  all  iterates  are  required  to  lie  on  the  mesh; 
therefore  the  function  TV"  must  be  constructed  so  that  every  discrete  neighbor  of  the  current 
iterate  must  also  he  on  the  mesh,  i.e.  J\f(xk )  C  Mk  for  all  k  =  0,1,...  Since  the  categorical 
variables  are  limited  to  a  predefined  list,  TV(x)  is  required  to  be  finite.  Additionally,  the 
set- valued  function  TV"  has  the  reflective  property,  so  that  x  E  N(x)  for  each  x  E  Cl. 

With  a  well-defined  neighborhood  function  for  the  categorical  variables,  local  optimal¬ 
ity  in  a  mixed  variable  domain  can  now  be  defined  in  terms  of  the  discrete  neighbor  set. 

Definition  3.5  A  point  x  =  ( xc,xd )  E  f2  is  a  local  minimizer  of  /  with  respect  to  the  set 
of  neighbors  J\f(x)  C  0  if  there  exists  an  e  >  0  such  that  f[x)  <  f(v)  for  all  v  in  the  set 

HP  (J  (B(yc,e)xyd)  (3.4) 

ye  M(x) 
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where  B(yc,e )  is  an  open  ball  of  radius  e  centered  at  yc.  Because  this  definition  applies  to 
the  local  neighborhoods  of  all  the  discrete  planes  in  the  discrete  neighbor  set,  this  definition 
imposes  a  stronger  condition  than  simply  requiring  optimality  with  respect  to  the  discrete 
neighbor  set  and  the  local  continuous  neighborhood  of  the  incumbent.  Additionally,  the 
quality  of  the  local  minimizer  is  directly  related  to  the  definition  of  TV".  By  increasing  the 
number  of  members  in  the  discrete  neighbor  set,  the  local  minimizer  becomes  more  global; 
however,  the  number  of  function  evaluations  needed  to  satisfy  the  optimality  condition 
increases. 

In  general,  optimization  requires  that  for  a  point  to  be  considered  a  local  minimizer, 
a  first-order  necessary  condition  must  be  satisfied.  As  with  the  previous  definition,  this 
condition  also  requires  revision  in  order  to  apply  to  the  mixed  variable  domain.  In  doing 
so,  Vc/  denotes  the  gradient  of  /  with  respect  to  the  continuous  variables.  The  following 
definition  given  by  Sriver  [55]  is  presented  by  Lucidi  et  al.  [38]  in  a  different  form: 

Definition  3.6  A  point  satisfies  first-order  necessary  conditions  for  optimality  if 

1.  ( wc  —  xc)T\7cf(x )  >  0  for  any  feasible  (wc,xd)  G  Q; 

2.  /(x)  <  f(y)  for  any  discrete  neighbor  y  G  TV(x)  C  0; 

3.  ( wc  —  yc)TVc/(y)  >  0  for  any  discrete  neighbor  y  G  TV(x)  satisfying  f(y)  =  f(x)  and 
for  any  feasible  ( wc ,  yd )  G  Cl. 

Definition  3.6  requires  stationarity  at  x  with  respect  to  the  continuous  variables  (con¬ 
dition  1),  optimality  with  respect  to  the  discrete  neighbors  (condition  2),  and  stationarity 
with  respect  to  the  continuous  variables  at  each  discrete  neighbor  of  x  (condition  3).  Fi¬ 
nally,  the  definition  of  convergence  must  be  revised  to  fit  within  the  context  of  a  mixed 
variable  domain. 
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Definition  3.7  (Abramson  [2])  Let  12  C  (RnC  x  Znd)  be  a  mixed  variable  domain.  A 
sequence  { Xi }  G  12  is  said  to  converge  to  x  G  12  if,  for  every  e  >  0,  there  exists  a  positive 
integer  N  such  that  xf  =  xd  and  \\x%  —  xc\\  <  e  for  all  i  >  N. 


Under  the  mild  conditions,  Audet  and  Dennis  [5]  showed  that  for  bound  constrained 
optimization  of  MVP  problems,  the  MGPS  algorithm  converges  to  a  point  satisfying  the 
first-order  necessary  conditions  for  optimality,  which  is  extended  to  linear  constraints  by 
Abramson  [2]. 

3.3  Bound  and  Linear  Constraints 

For  bound  and  linear  constraints,  the  barrier  approach  described  in  Section  2.4.1  is 
applied  in  conjunction  with  the  tangent  cone  generator  approach  of  Lewis  and  Torczon  [36] 
across  each  discrete  plane.  In  the  barrier  approach,  infeasible  points  are  not  evaluated  and 
their  function  values  are  set  to  +oo.  While  this  prevents  infeasible  points  from  replacing 
the  incumbent,  it  does  not  guarantee  convergence  to  a  stationary  point.  Lewis  and  Torczon 
[36]  establish  that,  in  order  for  GPS  to  converge  to  a  first-order  stationary  point  of  a 
linearly  constrained  problem,  each  direction  set  Dl  must  be  sufficiently  rich  so  that  the 
polling  directions  Dlk  can  be  chosen  to  conform  to  the  geometry  of  the  nearby  constraint 
boundaries.  This  is  done  by  including  in  each  Dl  the  tangent  cone  generators  for  every 
point  in  12c.  The  tangent  cone  is  defined  as  follows  (Nocedal  and  Wright  [46]): 


Definition  3.8  A  vector  w  €  Mn  is  tangent  to  X  at  x  €  X  if,  for  all  vector  sequences 
{ Xi }  with  Xi  — >  x  and  x$  €  X,  and  all  positive  scalar  sequences  U  J,  0,  there  is  a  sequence 
Wi  — >  w  such  that  Xi  +  Uwi  €  X  for  all  i.  The  tangent  cone  at  x  is  the  collection  of  all 
tangent  vectors  to  X  at  x. 


Since  linear  constraints  form  a  convex  set,  for  x  €  X,  the  tangent  cone  to  X  at  x  can  be 
expressed  as  Tx(x)  =  cl {^{w  —  x)  :  fi  >  0,w  €  X}. 
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K{X,£) 


K°(  X,£) 

Q. 

Figure  3.1.  Tangent  Cone  Near  Boundary  (adapted  from  Lewis  and  Torczon  [36]) 

As  presented  by  Lewis  and  Torczon  [36],  algorithms  which  conform  to  the  geometry 
of  the  linear  constraint  boundaries  need  to  conform  only  to  the  nearby  boundaries.  If  the 
current  iterate  is  within  e  >  0  of  a  constraint  boundary,  then  the  tangent  cone  K°(x,e ) 
can  be  represented  as  the  polar  of  the  cone  K(x ,  e)  of  normals  for  the  constraints  within  e 
of  Xk-  A  representative  of  the  geometric  relationship  between  these  boundaries  and  cones 
is  illustrated  in  Figure  3.1. 

The  following  definition  from  Audet  and  Dennis  [6]  formalizes  this  concept. 


Definition  3.9  A  rule  for  selecting  the  positive  spanning  sets  D &  =  D(k,  Xk )  C  D  conforms 
to  X  for  some  e  >  0,  if  at  each  iteration  k  and  for  each  y  in  the  boundary  of  X  for  which 
||y  —  Xk\\  <  £,  Tx(y )  is  generated  by  nonnegative  linear  combination  of  the  columns  of  a 
subset  of  Dk- 


By  this  definition,  when  x*,  G  0  is  not  near  a  boundary,  there  are  no  additional  requirements 
imposed  on  the  positive  spanning  set;  it  is  only  when  Xk  G  D  is  near  a  boundary  that  the 
positive  spanning  set  is  required  to  contain  directions  that  conform  to  the  boundaries  of 
the  active  constraints. 
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3.4  MGPS  Algorithm 


The  mixed  variable  generalized  pattern  search  (MGPS)  algorithm  for  deterministic 
optimization  extends  GPS  methods  to  MVP  problems  through  modification  to  the  polling 
performed  on  the  feasible  region.  As  noted  in  Section  2.4.1,  the  SEARCH  step  is  not  required 
to  guarantee  convergence;  thus  the  only  restriction  of  the  SEARCH  step  is  that  it  evaluates 
a  finite  number  of  mesh  points,  denoted  as  Sk  C  Mk ■  Since  the  mesh  is  defined  for  all  the 
discrete  planes  of  the  MVP  structure,  it  should  be  noted  that  in  the  MGPS  algorithm  the 
search  for  points  of  improvement  is  not  limited  to  the  current  discrete  plane. 

For  the  treatment  of  categorical  variables,  polling  in  the  feasible  region  is  conducted 
in  separate  POLL  and  EXTENDED  POLL  steps.  At  iteration  k,  the  POLL  step  evaluates 
points  from  the  set  Pk(xk )  and  from  the  set  of  discrete  neighbors  N(xk).  If  the  algorithm 
fails  to  find  improvement  from  the  SEARCH  and  POLL  steps,  the  EXTENDED  POLL  step  is 
conducted  to  poll  around  points  in  the  set  of  discrete  neighbors  whose  objective  function 
value  is  sufficiently  close  to  that  of  the  incumbent.  To  identify  such  points,  the  extended 
poll  condition,  given  by 


f(xk)  <  f{y)  <  f(xk )  +  4  (3.5) 

must  be  satisfied  by  a  point  y  G  N(xk)  in  the  discrete  neighbor  set  of  the  incumbent  xk, 
to  be  considered  for  extended  polling.  The  £ k  parameter,  referred  to  as  the  extended  poll 
trigger  at  iteration  fc,  is  a  scalar  satisfying  £k  >  £  >  0,  where  £  is  a  user-defined  lower 
bound.  It  is  typically  set  as  a  percentage  of  the  objective  function  value  (but  bounded 
away  from  zero)  (see  Abramson  [2]).  In  a  manner  similar  to  the  user-defined  neighbor  set, 
the  quality  of  the  local  minimizer  is  determined  in  part  by  the  selection  of  the  tolerance 
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value  £.  By  increasing  £,  the  search  for  points  of  improvement  becomes  more  global,  since 
extended  polling  will  be  triggered  more  frequently. 

For  discrete  neighbors  that  satisfy  the  extended  poll  condition,  polling  is  conducted  in 
a  manner  similar  to  the  POLL  step  of  the  original  GPS  method;  that  is,  polling  is  restricted 
to  the  discrete  plane  of  the  current  discrete  neighbor  point.  By  initiating  the  polling  around 
a  particular  discrete  neighbor  yk,  the  extended  polling  sequence  {yk}Jj'Li  is  terminated  when 
either  f(yJkk  +  Ak(d,0))  <  f(xk)  for  some  d  <E  Dk(ykk)  or  f(xk)  <  f(yJkk  +  Ak(d,0))  for 
all  d  6  Dk(ykk),  where  Jk  is  the  total  number  of  extended  poll  points  considered  in  the 
EXTENDED  POLL  step.  By  this  construction  of  extended  polling,  the  set  of  extended  poll 
points  evaluated  about  a  particular  discrete  neighbor  yk  can  be  denoted  as 

£{yk)  =  {pk(yJk)}.=1  (3-6) 

regardless  of  the  terminating  condition  of  the  polling  sequence.  Thus,  the  set  of  all  extended 
poll  points  considered  by  the  EXTENDED  POLL  step  at  iteration  k  is  defined  as 

Xk{£k)  =  U  £(Vk)  (3.7) 

where  Afk  =  {y  E  A f(xk)  :  f(xk)  <  f(y)  <  f(xk)  +Cfc}  is  the  subset  of  discrete  neighbors 
that  trigger  extended  polling. 

The  trial  set  Tk  that  is  used  to  conditionally  update  the  mesh  size  parameter  can  now 
be  expressed  as  the  union  of  all  the  iterates  generated  by  the  SEARCH,  POLL,  and  EXTENDED 
POLL  steps,  denoted  as  Tk  =  Sk^Pk{xk)VJAf{xk)^Xk{Ek).  Abramson  [2]  uses  this  notation 
to  formalize  the  following  definitions  for  MVP  problems: 

Definition  3.10  If  f(y)  <  f(xk)  for  some  y  €  Tk,  then  y  is  said  to  be  an  improved  mesh 
point. 
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Definition  3.11  If  f(y)  >  f(xk)  for  all  y  €  Pk(xk )  U  Af(xk)  U  Xk(£k)>  then  y  is  said  to  be 
a  mesh  local  optimizer. 


The  updating  of  the  mesh  parameter  is  performed  in  exactly  the  same  manner  as  in 
the  original  GPS  method,  given  in  Figure  2.6.  Specifically,  when  an  improved  mesh  point 
is  found  within  trial  set  2\,  the  iteration  is  considered  successful  and  the  mesh  parameter 
is  updated  as 


Afc+1  =  Tmk  Afc  (3.8) 

where  r  >  1  is  a  rational  number  that  remains  constant  over  all  iterations,  and  the  integer 
rrik  satisfies  0  <  rrik  <  mmax  for  some  fixed  integer  mmax  >  0.  However,  if  the  iteration  is 
unsuccessful,  then  the  mesh  parameter  is  updated  as 

Afc+1  =  rmfc  Afc  (3.9) 

where  r  >  1  is  a  rational  number  that  remains  constant  over  all  iterations,  rmfc  €E  (0, 1), 
and  the  integer  rrik  satisfies  mm\n  <  rrik  <  —  1  for  some  fixed  integer  mm in. 

The  Audet  and  Dennis  [5]  MGPS  algorithm  for  deterministic  bound  constrained  opti¬ 
mization  is  shown  in  Figure  3.2.  Under  the  mild  assumptions  that  the  level  set 

£q(zo)  =  {x  G  D  :  f(x)  <  f(x 0)}  (3.10) 

is  compact  and  the  objective  function  /  is  continuously  differentiable  over  a  neighborhood  of 
Cq{x0)  when  the  discrete  variables  are  fixed,  and  the  rule  for  selecting  directions  conforms 
to  Oc  for  some  e  >  0,  then  the  MGPS  algorithm  converges  to  a  point  satisfying  the  first- 
order  necessary  conditions  for  optimality. 
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Mixed  Variable  Generalized  Pattern  Search  (MGPS)  Algorithm 

Initialization:  Choose  a  feasible  starting  point  xq  such  that  fn(x o)  <  oo. 

Set  a  discrete  neighbor  set  J\f  and  extended  poll  trigger  £  >  0. 

Let  D  be  a  positive  spanning  set. 

Let  the  Mo  C  be  the  mesh  defined  by  mesh  size  parameter  Ao  >  0  and  Do  £  D. 

Set  the  iteration  counter  k  to  0. 

For  k  =  0, 1, ... 

1.  Set  extended  poll  trigger  £fc  >  £. 

2.  SEARCH  Step  (Optional):  Employ  some  finite  strategy  seeking  an  improved  mesh  point; 
i.e.,  xk+i  €  Mk  such  that  fn(xk+ i)  <  fn(xk). 

3.  poll  Step:  If  the  search  step  does  not  find  an  improved  mesh  point,  evaluate  fn  at 
the  points  in  Pk{xk )  U  Af(xk)  until  an  improved  mesh  point  xk+i  is  found  (or  until  done). 

4.  EXTENDED  POLL  step:  If  the  SEARCH  and  POLL  steps  did  not  find  improved  mesh 
point,  evaluate  /  at  points  in  Xk(£k)  until  either  an  improved  mesh  point  xk+\  is  found  (or 
until  done). 

5.  Update:  If  SEARCH,  POLL  or  EXTENDED  POLL  finds  an  improved  mesh  point,  then 
update  Xfc+i,  and  set  A^+i  =  rmkAk  >  Ak  where  r  >  1  is  a  rational  number  that  remains 
constant  over  all  iterations,  and  the  integer  mk  satisfies  0  <  mk  <  mmax  for  some  fixed  integer 
^max  A  0; 

Otherwise,  set  xk+i  =  xk,  and  set  Afc+i  =  rmkAk  <  Ak  where  r  >  1  is  a  rational  number  that 
remains  constant  over  all  iterations,  Tmk  £  (0, 1),  and  the  integer  mk  satisfies  m„,in  <  mk  <  —  1 
for  some  fixed  integer  mm i„. 

6.  Terminate  the  algorithm  if  the  stopping  criterion  is  met  or  if  the  budget  of  function 
evaluations  is  reached;  otherwise,  return  to  step  1. 

Figure  3.2.  MGPS  Algorithm  for  Deterministic  Optimization  (adapted  from  Abramson  [2]) 

3.5  Proposed  Modifications 

Modifications  for  improving  the  efficiency  of  MGPS  and  extending  the  applicability 
of  NOMADm  to  stochastic  simulations  are  now  presented.  The  following  two  subsections 
describe  both  the  purpose  behind  and  the  incorporation  of  the  modifications  presented  in 
Section  1.2. 
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3. 5. 1  Search  Approach 

As  noted  in  Section  2.4.2,  the  SEARCH  step  in  the  GPS  algorithm  allows  the  user  the 
flexibility  to  employ  any  finite  heuristic  in  an  attempt  to  quickly  find  improvement.  While 
the  SEARCH  step  contributes  nothing  to  the  convergence  theory,  a  wise  choice  can  greatly 
increase  efficiency  and  even  affect  the  quality  (Booker  et  al.  [9]). 

For  MVP  problems,  Audet  and  Dennis  [5]  retained  the  SEARCH  step  in  the  MGPS 
algorithm.  As  noted  in  Section  3.4,  because  the  mesh  is  defined  across  all  the  possible 
categorical  settings,  the  search  can  be  applied  globally.  If  the  SEARCH  step  is  unsuccessful  in 
finding  an  improved  mesh  point,  the  algorithm  simply  proceeds  to  the  next  step.  However, 
if  the  SEARCH  step  locates  a  improved  mesh  point,  the  incumbent  is  replaced,  Afe  is  updated, 
and  the  current  iteration  ends.  Thus,  an  efficient  search  can  lead  to  a  reduction  in  function 
evaluations  to  reach  a  local  optimizer. 

The  need  for  a  modification  of  the  MGPS  algorithm  can  be  illustrated  by  the  case  in 
which  a  SEARCH  step  would  locate  a  point  w  in  a  discrete  plane  with  different  categorical 
settings  than  the  incumbent.  Under  the  current  algorithm,  a  SEARCH  step  is  only  per¬ 
formed  prior  to  the  POLL  step;  thus,  a  user  is  limited  to  the  discrete  planes  on  which  a 
search  heuristic  may  be  applied.  Thus,  under  the  standard  application  of  MGPS,  the  user 
would  have  to  perform  a  complete  MGPS  iteration  before  performing  the  SEARCH  step  that 
would  locate  the  point  w.  The  proposed  change  to  the  MGPS  algorithm  is  to  include  an 
EXTENDED  SEARCH  step  that  provides  the  user  the  ability  to  employ  a  finite  search  heuris¬ 
tic  on  any  of  the  discrete  planes  before  the  EXTENDED  POLL  step  is  performed. 

The  EXTENDED  SEARCH  step  can  increase  the  efficiency  of  the  algorithm  not  only  by 
reducing  the  number  of  required  iterations,  but  also  by  reducing  the  sampling  requirements 
when  surrogates  are  used.  Since  the  EXTENDED  SEARCH  step  occurs  after  the  POLL  step, 
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a  user  could  choose  to  construct  surrogates  only  for  those  discrete  planes  that  indicate 
areas  of  possible  improvement,  i.e.  those  that  correspond  to  discrete  neighbors  that  trigger 
extended  polling.  Because  a  space-filling  set  of  points  is  commonly  used  in  constructing  an 
initial  surrogate,  the  EXTENDED  SEARCH  step  provides  a  means  for  constructing  surrogates 
only  as  required,  without  doing  so  for  every  discrete  plane. 

Another  user  option  that  this  step  introduces  is  the  extended  polling  of  points  gener¬ 
ated  by  the  EXTENDED  SEARCH  step.  Suppose  that  the  point  w  does  not  provide  strict 
improvement  over  the  incumbent  Xk  but  provides  strict  improvement  over  the  discrete 
neighbor  of  the  incumbent  y  G  jV(xfc).  Since  f{w)  >  f(xk),  the  point  w  is  not  considered 
an  improved  mesh  point  and  the  algorithm  would  progress  from  a  SEARCH  step  to  a  POLL 
step.  However,  since  the  objective  function  value  of  the  point  w  indicates  a  possible  area  of 
improvement,  the  user  may  consider  polling  around  the  point  w  in  addition  to  the  discrete 
neighbor  y  during  the  EXTENDED  POLL  step.  From  a  convergence  standpoint,  it  is  impor¬ 
tant  to  note  that  one  cannot  directly  replace  the  discrete  neighbor  y  in  the  EXTENDED  POLL 
with  a  point  in  the  discrete  plane  that  provides  strict  improvement  with  respect  to  the  dis¬ 
crete  neighbor,  such  as  the  point  w.  If  the  last  iterate  of  the  EXTENDED  POLL  initiated  at 
the  point  x  is  denoted  as  Zk{x ),  then  there  is  no  guarantee  that  f(zk{w ))  <  f(zk(y))- 

In  a  manner  similar  to  the  SEARCH  step,  the  EXTENDED  SEARCH  step  provides  the  user 
the  ability  to  employ  a  finite  search  heuristic  to  accelerate  the  convergence  of  the  algorithm 
without  affecting  the  underlying  convergence  theory.  As  a  result,  the  computational  effi¬ 
ciency  of  the  MGPS  algorithm  can  be  increased  by  including  the  EXTENDED  SEARCH  step 
without  requiring  new  convergence  results. 
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3.5.2  R&S  Procedure 


Sriver  [55]  showed  that  under  mild  assumptions  a  R&S  procedure  can  be  used  in  a 
MGPS  algorithm  for  stochastic  systems  to  provide  almost  sure  convergence  to  an  appro¬ 
priately  defined  first-order  stationary  point.  The  assumptions  can  be  divided  into  those 
relating  to  the  algorithm  (presented  for  the  MGPS  algorithm  in  Section  3.3)  and  those  re¬ 
lating  to  the  statistical  properties  of  the  problem  and  R&S  procedure  selected  (provided 
in  this  section).  Just  as  in  Section  3.2,  the  special  structure  of  MVP  problems  requires  a 
revision  of  almost  sure  convergence  to  include  the  mixed  variable  domain  found  in  Sriver’s 
work. 

Definition  3.12  Let  12  C  (Mn‘  x  Znd)  be  a  mixed  variable  domain.  A  sequence  of 
multivariate  random  vectors  { X *,}  converges  almost  surely  (a.s.)  to  the  limit  point  x  G  12 
if,  for  every  e  >  0,  there  exists  a  positive  integer  N  such  that  Pr{V(?  =  xd}  =  1  and 
Pr{||V£  -  xc||  <  e}  =  1  for  all  k  >  N. 

As  part  of  numerical  testing,  Sriver  [55]  performed  a  comparative  analysis  of  com¬ 
peting  direct  search  methods  under  various  R&S  procedures;  namely,  Rinott’s  two  stage 
procedure  [51],  a  screen-and-select  procedure  of  Nelson  et  al.  [45],  and  Sequential  Selection 
with  Memory  (SSM)  of  Pichitlamken  and  Nelson  [48] .  Since  SSM  was  reported  from  the 
comparative  analysis  to  offer  performance  advantages  over  the  other  R&S  procedures,  it 
was  chosen  as  the  R&S  procedure  to  implement  in  this  research. 

As  noted  in  Section  2.5,  SSM  is  a  fully  sequential  procedure  specifically  designed  for 
iterative  search  routines,  in  which  one  sample  at  a  time  is  taken  from  every  candidate  still 
in  play  and  eliminates  clearly  inferior  ones  as  soon  as  their  inferiority  is  apparent.  In  order 
to  locate  inferior  candidates,  SSM  performs  a  pairwise  statistical  test  at  every  iteration  for 
each  of  the  candidates.  By  removing  the  candidates  that  are  statistically  inferior  to  all  the 
other  members  of  the  candidate  set,  SSM  provides  a  computationally  efficient  procedure  to 
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Figure  3.3.  Triangular  Continuation  Region  for  SSM 
select  the  best  candidate.  Another  advantage  of  SSM  is  the  utilization  of  previously  stored 
sampling  data,  which  alleviates  some  of  the  computational  burden  of  obtaining  additional 
simulation  outputs  at  each  iteration  of  the  optimization  algorithm. 

The  key  in  understanding  SSM  is  the  concept  of  the  triangular  continuation  region, 
which  is  now  detailed.  It  is  constructed  as  the  result  of  an  initial  stage  of  sampling  used 
to  estimate  the  variances  between  each  pair  of  candidates.  Using  the  notation  of  Section 
2.5,  with  XiP,  p  =  1,  ...rmax,  denoting  the  replications  of  candidate  Xi,  for  i  =  1, ...,  k,  the 
estimate  of  the  variances  between  each  pair  of  candidates  a |  -  =  Y[Xip  —  Xjp]  is  calculated  as 

Si  =  ]TS°  (FiP  -  Fjp  -  [Fi(s0)  -  ^-(so)])2  (3.11) 

J  so  —  t  c — p=1 

where  i  and  j  =  1, ...,  k,  So  is  the  user-defined  number  of  initial  replications  and 
Fi(so)  =  ^Jp°=i  ^ip- 


60 


The  resulting  variance  estimates,  together  with  the  user-defined  indifference  zone  pa¬ 
rameter  5  and  level  of  significance  parameter  a,  establish  the  parameters  of  the  triangular 
continuation  area;  namely, 


.  6  v{s0  -  1  )S; 

A  -  2c  and  atj  -  ^  z  A) 


(3.12) 


where  the  recommended  values  for  c  €  R  and  77  €  R  are  1  and  ((/c  —  l)/(2a))2/(So_1)_1,  re¬ 
spectively.  The  general  triangular  continuation  region  for  the  SSM  procedure  is  illustrated 
in  Figure  3.3,  where  r  represents  the  number  of  samples  of  the  SSM  procedure. 

Recall  from  Section  2.5  that  Hartmann  [25]  replaced  the  large  deviation  bound  by  a 
Brownian  motion  bound  on  the  acceptance  region.  Let  B(t]S,  cr2)  denote  the  Brownian 
motion  process  with  E[£?(i;  S,  cr2)]  =  St  and  V[B(t;  <5,  cr2)]  =  a2t.  Hartmann  [25]  demon¬ 
strated  that  the  Brownian  motion  process  B(t\ S,  a2)  with  S  >  0,  m  >  0,  A  >  0,  and  stopping 
time  defined  as 


T  =  inf{t  :  |J3(f;  5,  cr2)|  >  A (m  —  t)} 


(3.13) 


exits  toward  the  lower  boundary  of  the  continuation  region  with  probability 


Pr{B{t]5,a 2)  <  0} 


roo  e  2\e/cr  £  m  (mS / o)  .  de 

.Loo  l  +  e~2^{ 


(3.14) 


Since  the  triangular  region  defines  an  incorrect  selection  as  breaking  the  lower  bound 
of  the  triangular  region,  the  total  probability  that  the  SSM  procedure  selects  the  wrong 
candidate  is  given  by 


^iE[Pr{R(t;(5,(T2fc)<0}] 


(3.15) 
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Fabian  [22]  showed  that  when  A  =  8/2c  and  m  =  a^/A,  Equation  (3.15)  is  equivalent 


to 


2c-  1 


)(*-A)}] 


(3.16) 


where  /  is  the  indicator  function. 

By  substitution  of  the  recommended  values  for  A  and  o^-  from  Equation  (3.12)  and  un¬ 
der  the  assumption  that  the  candidates  are  normally  distributed,  the  expression  in  Equation 
(3.16)  is  equivalent  to  a,  which  produces  the  desired  level  of  significance  and  indifference 
given  by  Equation  (2.21). 

Using  this  construction  of  the  triangular  continuation  region,  the  maximization  pro¬ 
cedure  of  SSM  (see  Pichitlamken  and  Nelson  [49])  can  be  converted  into  the  required  min¬ 
imization  selection  procedure  for  this  research.  The  resulting  procedure,  using  the  recom¬ 
mended  values  for  c  and  77,  is  given  in  Figure  3.4. 

As  part  of  the  convergence  analysis  for  the  MGPS-RS  algorithm,  Sriver  [55]  demon¬ 
strated  that  by  using  a  MGPS  algorithm,  almost  sure  convergence  for  stochastic  systems 
is  guaranteed  by  enforcing  the  following  assumptions: 


Al:  All  iterates  X *.  produced  by  the  MGPS  algorithm  lie  in  a  compact  set. 

A2:  The  objective  function  /  is  continuously  differentiable  with  respect  to  the  continuous 
variables. 

A3:  For  each  set  of  discrete  variables  Xd,  the  corresponding  set  of  directions  Dl  =  G'jE,, 
as  defined  in  (3.1),  includes  tangent  cone  generators  for  every  point  in  Xc. 

A4:  The  rule  for  selecting  directions  D\.  conforms  to  Xc  for  some  e  >  0  (see  Definition  3.9). 


62 


Sequential  Selection  with  Memory  Procedure 

Initialization:  Choose  a  number  of  stage  zero  samples  so  >  2. 

Let  Sic  denote  the  number  of  previously  stored  responses  of  candidate  X(. 

For  any  member  AQ  6  with  a  stored  responses  Sic  <  so,  collect  so  —  sic  more 

responses. 

Choose  a  significance  level  ^  <  1  —  a  <  1  and  indifference  zone  parameter  5. 

1.  Estimate  u?-  =  V[ACp  —  Xjp]  with 

i  SO 

Sfj  =  ——  -  Fip  [^(so)  -  ^'(so)])2 

p= 1 


2.  Compute  the  procedure  parameters  as 


A  =  -  and  an  = 
2 


7i(s0  -  1  )Sfj 
4(5  -  A) 


where  rj  =  {{k  —  l)/(2a))2^*—  1)  b 

3.  Let  Nij  =  ,  IVj  =  max,;y,y  {Aiy},  IV  =  maxi<,(<fc  iV(.  If  s0  >  IV,  then  stop  and  select 

the  candidate  with  the  smallest  sample  mean  as  the  best.  Otherwise,  let  I  =  {1, . . . ,  k}  be  the 
set  of  surviving  solutions,  set  counter  r  =  so  and  proceed  to  Step  4. 

4.  Set  /old  =  I.  Let 


I  =  li  :  *  6  /old  and  JL  <  min  (Ra  +  an) - 

\  -  ie/old,j/iV  3  3>  2 

where 

r>  .  _  /  X)p=l  Xjp ,  if  Sic  <  r 

j  I  ^(S=iXjp).  otherwise 

5.  If  |/|  =  1,  then  stop  and  report  the  only  survivor  as  the  best;  otherwise,  for  each  candidate 
Xi  G  {i  G  I  :  Sic  <  r  +  1}  collect  one  additional  response  sample  and  set  r  =  r  +  1. 

If  r  =  N  +  1,  terminate  the  procedure  and  select  the  solution  in  /  with  the  smallest  sample 
mean  as  the  best;  otherwise,  for  each  i  G  I  go  to  Step  4. 


Figure  3.4.  Sequential  Selection  with  Memory  (adapted  from  Pichitlamken  and  Nelson  [48]) 


A5:  For  each  i  =  1,2,  the  responses  {FipYp={  are  independent,  identically  and 

normally  distributed  random  variables  with  mean  /(AQ)  and  unknown  variance  af  <  oc, 
where  of  ^  a ^  whenever  £  ^  q. 

A6:  The  sequence  of  significance  levels  {ar}  satisfies  ar  <  oo,  and  the  sequence  of 
indifference  zone  parameters  {5r}  satisfies  lim^oo  Sr  =  0. 
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A7:  For  the  rth  R&S  procedure,  the  probability  of  correctly  selecting  the  best  candidate 
X[]|  is  at  least  1  —  ar  whenever  /(Yjjj)  —  /(Y[ ij)  >  8r  for  any  i  €  {2, 3, ,  k}. 

A8:  For  all  but  a  finite  number  of  MGPS  iterations  and  sub-iterations,  the  best  solution 
Ap]  is  unique;  i.e.,  /(Apj)  ^  f(X^q)  for  all  i  6  {2,  3, . . . ,  A:},  where  each  member  of  the 
candidate  set  lies  on  the  mesh  defined  at  iteration  k. 

Assumption  A1  is  a  standard  assumption  of  the  MGPS  algorithm.  A  simplifying 
assumption  on  the  target  class  of  problems  for  this  research  is  represented  by  A2.  As¬ 
sumptions  A3  and  A4  are  required  for  convergence  of  the  MGPS  algorithm.  Assumption 
A5  is  a  common  requirement  for  R&S  techniques  and  can  be  achieved  in  simulation  by 
batching  the  output  data  or  the  use  of  sample  averages  of  independent  replications  Nelson 
et  al.  [45] .  Assumption  A6  is  a  requirement  imposed  by  the  convergence  theory  presented 
by  Sriver  [55].  By  requiring  the  R&S  parameters  to  decay,  the  incumbents  are  forced  to 
provide  almost  sure  convergence  to  a  limit  point.  Assumption  A7  enforces  the  iterative 
correct  selection  guarantee  of  the  R&S  procedure  from  Equation  (2.21).  Finally,  assump¬ 
tion  A8  is  required  to  ensure  that  the  indifference  zone  condition  is  eventually  met  during 
the  course  of  the  iteration  sequence. 

Since  the  proposed  modification  to  the  MGPS  algorithm  does  not  alter  the  underly¬ 
ing  convergence  theory,  implementation  of  the  EXTENDED  SEARCH  step  does  not  violate 
assumptions  A1-A4.  Additionally,  by  placing  an  update  step  within  the  modified  algo¬ 
rithm,  in  a  similar  manner  as  Sriver’s  [55]  MGPS-RS  algorithm  ,  assumption  A6  is  main¬ 
tained.  The  remaining  assumptions  A5,  A7,  and  A8  can  be  satisfied  by  proper  selection 
of  the  simulation  model  and  R&S  procedure.  Therefore,  the  modified  MGPS  algorithm, 
presented  in  Figure  3.5,  provides  almost  sure  convergence  to  a  first-order  stationary  point 
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Modified  Mixed  Variable  Generalized  Pattern  Search  Algorithm 

Initialization:  Choose  a  feasible  starting  point  xq  such  that  fn(x o)  <  oo. 

Set  a  discrete  neighbor  set  J\f  and  extended  poll  trigger  £  >  0. 

Let  D  be  a  positive  spanning  set. 

Let  the  Mo  C  be  the  mesh  defined  by  mesh  size  parameter  Ao  >  0  and  Do  £  D. 

Set  the  R&S  parameters  ao  £  (0, 1),  and  5o  >  0. 

Set  the  iteration  counter  k  to  0. 

Set  the  R&S  counter  r  to  0. 

For  k  =  0, 1, ... 

1.  Set  extended  poll  trigger  £fe  >  £. 

2.  SEARCH  Step  (Optional):  Using  an  R&S  procedure  with  parameters  ar  and  5r  for 
candidate  evaluation,  employ  some  finite  strategy  seeking  an  improved  mesh  point. 

Update  ar+i  <  ar,  dr+i  <  $n  and  r  =  r  +  1. 

3.  POLL  Step:  If  the  SEARCH  step  does  not  find  an  improved  mesh  point,  evaluate  /q  at 
the  points  in  Pk{xk)  U  Af(xk),  using  an  R&S  procedure  with  parameters  ar  and  5ri  until  an 
improved  mesh  point  Xk+ i  is  found  (or  until  done). 

Update  ar+i  <  ar,  Sr+i  <  Sr,  and  r  =  r  +  1. 

4.  extended  search  Step  (Optional):  Using  an  R&S  procedure  with  parameters  ar 

and  Sr  for  candidate  evaluation,  employ  some  finite  strategy  seeking  an  improved  mesh  point. 
Update  ar+i  <  ar,  dr+i  <  5ri  and  rmr  +  1. 

5.  extended  poll  step:  If  the  search,  poll,  and  extended  search  steps  did 

not  find  improved  mesh  point,  evaluate  /  at  points  in  Afc(£fc),  using  an  R&S  procedure  with 
parameters  ar  and  6r,  until  either  an  improved  mesh  point  Xk+i  is  found  (or  until  done). 

Update  ar+i  <  ar,  <5r+i  <  5r,  and  r  =  r  +  1. 

6.  Update:  If  SEARCH,  POLL  or  EXTENDED  POLL  finds  an  improved  mesh  point,  then 
update  Xfe+i,  and  set  A^.+i  =  rmfcAfe  >  A k  where  r  >  1  is  a  rational  number  that  remains 
constant  over  all  iterations,  and  the  integer  mk  satisfies  0  <  mk  <  mmax  for  some  fixed  integer 

^max  A  {}: 

Otherwise,  set  Xk+i  =  Xk ,  and  set  A^+i  =  rmfc  A^  <  A/,  where  r  >  1  is  a  rational  number  that 
remains  constant  over  all  iterations,  rmk  £  (0, 1),  and  the  integer  mk  satisfies  mmin  <  mk  <  —1 
for  some  fixed  integer  mmi„. 

7.  Terminate  the  algorithm  if  the  stopping  criterion  is  met  or  if  the  budget  of  function 
evaluations  is  reached;  otherwise,  return  to  step  1. 

Figure  3.5.  Modified  MGPS  Algorithm  for  Simulation  Optimization 
for  stochastic  simulation  systems,  and  converges  to  a  first-order  stationary  point  for  deter¬ 
ministic  simulation  systems. 
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3.6  Implementation  Considerations 

With  the  modified  MGPS-RS  algorithm  explained,  issues  concerning  the  implementa¬ 
tion  of  the  modified  code  into  NOMADrn  may  now  be  discussed.  Although  SSM  allows  for 
the  re-use  of  historical  data  of  all  previously  sampled  iterates  in  future  R&S  procedures,  in 
this  research  only  the  data  for  the  current  incumbent  is  maintained.  Since  NOMADrn  was 
originally  developed  for  deterministic  simulations,  data  corresponding  to  previously  sam¬ 
pled  iterates  are  maintained  in  a  cache  structure  to  prevent  re-sampling  of  the  same  point, 
thus  reducing  the  overall  number  of  required  function  evaluations.  However,  the  random¬ 
ness  in  stochastic  problems  makes  it  at  least  possible  that  a  previously  evaluated  trial  point 
is,  in  truth,  a  better  point,  and  would  be  discovered  to  be  so  with  more  replications.  If  the 
R&S  parameters  are  not  relatively  small  enough  when  an  iterate  is  initially  sampled,  the 
data  stored  in  the  cache  of  NOMADrn  may  be  inaccurate.  Since  the  costs  of  storing  com¬ 
plete  iterate  data  can  be  prohibitive,  the  objective  function  values  of  stochastic  simulation 
responses  are  estimated  and  entered  in  the  cache  as  the  sample  mean;  thus  the  estimate 
can  be  inaccurate  unless  R&S  parameters  are  relatively  small.  Then  in  future  samplings, 
when  the  R&S  parameters  are  tight  enough  to  produce  an  accurate  estimate,  the  iterate 
will  never  be  re-sampled  or  eligible  to  replace  the  incumbent  point  since  it  is  already  in 
the  cache.  This  issue  is  partially  remedied  by  the  assumption  that  the  initial  point  is  not 
too  close  to  the  optimal  and  by  the  fact  that  the  mesh  is  refined  over  time,  thus  allowing 
for  the  possibility  of  future  samplings  that  are  near  the  cached  point  to  be  evaluated  accu¬ 
rately.  Although  the  incumbent’s  sample  mean  is  entered  into  the  cache,  the  sample  data 
for  the  incumbent  is  maintained  outside  of  the  cache.  Once  the  incumbent  is  replaced,  the 
historical  data  is  updated  by  the  new  incumbent.  The  reason  that  this  exception  is  made 
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for  the  incumbent  is  that  the  incumbent  is  the  most  frequently  compared  point  and  thus 
this  is  expected  to  produce  the  largest  computational  savings. 

The  use  of  an  EXTENDED  SEARCH  step  also  introduces  an  issue  when  considering 
modification  to  the  NOMADrn  code.  Since  the  original  SEARCH  was  performed  before  the 
POLL  step,  NOMADrn  was  designed  to  only  sample  on  the  local  mesh  of  the  incumbent. 
Although  the  EXTENDED  SEARCH  allows  for  a  more  global  search,  sections  of  the  SEARCH 
routine  had  to  be  changed  to  avoid  the  creation  of  a  surrogate  across  different  categorical 
settings.  Specifically,  at  each  iteration,  the  cache  is  now  filtered  to  include  only  those 
iterates  that  have  the  same  categorical  variable  values  as  the  incumbent.  Additionally, 
since  the  number  of  function  evaluations  needed  to  create  a  surrogate  for  every  discrete 
neighbor  can  quickly  exceed  the  number  of  function  evaluations  saved  by  the  use  of  the 
surrogate,  a  surrogate  is  only  built  when  a  discrete  neighbor  point  triggers  the  extended 
polling  condition  (see  Equation  (3.5)).  This  avoids  the  building  of  surrogates  for  every 
possible  discrete  neighbor,  as  is  done  by  Sriver  [55],  and  is  more  general  than  the  original 
MGPS  algorithm. 

3. 7  Summary 

This  chapter  detailed  the  requirements  for  the  MGPS  algorithm  to  converge  to  a  point 
satisfying  the  first-order  necessary  condition  in  the  context  of  a  mixed  variable  domain. 
Additionally,  the  modified  algorithm  with  the  sequential  selection  with  memory  procedure 
was  presented.  In  the  next  chapter,  the  efficiency  of  using  the  modified  algorithm  within 
the  NOMADrn  is  examined  through  a  computational  evaluation. 
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Chapter  4  ~  Computational  Evaluation 


In  this  chapter,  the  proposed  optimization  algorithm  is  implemented  on  test  problems 
to  characterize  its  applicability  to  general  engineering  design  problems.  The  surrogate 
approaches  that  are  used  in  the  testing  are  described  in  order  to  explain  their  selection  and 
purpose.  Then,  the  design  of  the  experimental  investigation  is  presented,  including  the 
test  problem  and  series  of  runs  performed.  Finally,  the  results  are  analyzed  with  a  focus 
on  the  computational  efficiency  of  the  modified  algorithm. 

4 . 1  Research  Surrogates 

As  mentioned  in  Section  2.5,  an  inexpensive  surrogate  constructed  by  interpolating 
or  smoothing  known  values  can  improve  the  efficiency  of  the  optimization  algorithm  by 
reducing  the  number  of  function  evaluations  required  to  locate  a  point  that  produces  a 
decrease.  In  order  to  limit  the  number  of  additional  function  evaluations,  the  surrogates 
used  in  this  research  are  constructed  as  approximations  of  the  true  objective  function  and 
are  based  on  previously  computed  responses  that  are  stored  by  NOMADm  in  a  cache.  The 
resulting  surface  is  then  used  in  the  SEARCH  step  to  select  candidate  points  that  actively 
seek  regions  of  improved  mesh  points. 

Although  the  use  of  a  surrogate  requires  an  investment  of  initial  function  evaluations 
and  each  SEARCH  step  requires  additional  function  evaluations,  if  the  surrogate  can  suc¬ 
cessfully  locate  areas  of  improvement,  then  the  surrogate  may  accelerate  the  convergence 
of  the  algorithm  to  a  stationary  point.  As  a  result,  the  use  of  surrogates  may  reduce  the 
overall  number  of  function  evaluations  for  an  optimization  algorithm  to  produce  an  appro¬ 
priate  solution.  To  increase  the  likelihood  that  the  surrogate  is  able  to  find  such  areas 
of  improvement,  and  thus  contribute  positively  to  the  optimization  algorithm,  surrogates 
are  traditionally  selected  to  incorporate  some  knowledge  of  the  underlying  system.  Since 
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the  responses  in  this  optimization  problem  are  assumed  to  be  generated  by  a  "black  box" 
system,  the  selection  of  surrogates  is  limited  by  the  general  information  that  the  assump¬ 
tions  of  the  problem  type  provide.  In  particular,  the  objective  function  is  assumed  to  be 
continuously  differentiable;  thus  this  research  uses  surrogates  that  assume  the  underlying 
approximated  surface  is  smooth.  Two  general  methods  that  use  this  smoothness  assump¬ 
tion  are  the  nonparametric  approach  of  kernel  regression  and  the  parametric  approach  of 
Kriging. 

4-1.1  Kernel  Regression 

As  presented  by  Hardle  [24],  the  basic  idea  of  smoothing  of  a  dataset  {(W,  Tj)}™=1 
involves  the  approximation  of  the  mean  response  curve  /  in  the  regression  relationship 


Yi  =  f(Xi)  +  ei 


(4.1) 


where  i  =  1 n  and  in  the  context  of  this  research,  Xi  are  design  vectors,  Yj  are  the 
averaged  responses  of  the  associated  Xi  vector,  and  Ei  is  random  noise  of  the  system.  The 
underlying  assumption  is  that,  if  /  is  smooth,  then  an  observation  Xi  near  x  should  contain 
information  about  the  value  of  /  at  x.  Kernel  regression  is  a  means  of  quantifying  this  linear 
relationship.  The  approximation  of  the  mean  response  curve  /  is  traditionally  calculated 
by  the  Nadaraya- Watson  estimator  ([42]  and  [62])  that  uses  a  weighted  sequence  based  on 
a  kernel.  For  the  two  dimensional  case,  the  Nadaraya- Watson  estimator  is  given  as 


=  £?=i  Khjx-XjYj 

£"=1  Kh{x  -  Xi) 


(4.2) 
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where  Kh(u)  =  h~1K(u/h )  is  the  kernel  with  bandwidth  parameter  h  >  0.  The  kernel  is 
defined  as  a  continuous,  bounded  and  symmetric  real  function  K  which  integrates  to  unity 
(Hardle  [24]);  that  is, 


/+oo 

K(u)du  =  1  (4-3) 

-OO 

The  kernel  defines  the  shape  of  the  surface  between  known  response  points.  In  this 
research,  it  is  assumed  that  the  decay  from  a  known  response  is  normally  distributed; 
thus,  the  Gaussian  kernel,  K(u)  =  exp(— u2/2)/\/2tt,  is  used.  The  bandwidth  of  the  kernel 
represents  how  much  weight  is  placed  on  the  interpolation  with  respect  to  the  distance  from 
the  known  response  points.  Once  the  kernel  is  chosen,  the  estimator  can  be  re-written  in 
a  form  that  illustrates  the  dependence  of  the  estimator  function  on  the  bandwidth.  For 
this  research,  the  resulting  estimator  function  can  be  expressed  as 

N 

Z  ^exp(-S) 

f(Xj,  h)  =  -  (4.4) 

Z  exp(^^) 

where  D'j  =  \\x  —  Xj\\?2  represents  the  squared  Euclidean  distance  from  x  to  Xj.  As  Sriver 
noted  [55],  the  bandwidth  h  essentially  determines  the  degree  of  nonlinearity  in  the  surro¬ 
gate  function.  As  h  increases,  the  curvature  in  /  decreases  such  that,  when  h  is  very  large, 
/  is  a  constant  that  assumes  the  mean  value  of  all  Fj\  i.e.,  f  — >  YljLi  T)  as  h  — *  oo. 
Smaller  values  of  h  allow  more  curvature  in  /  but  can  cause  outliers  to  have  too  great  an 
effect  on  the  estimate.  As  h  — >  0,  /  assumes  the  value  of  Fj  for  the  corresponding  xj  that 
is  nearest  the  estimation  point. 


70 


Since  the  accuracy  of  the  estimator  depends  mainly  on  the  bandwidth  parameter  h. 
several  bandwidth  selection  procedures  have  been  introduced  (see  Hardle  [24]).  The  one 
employed  in  this  research  is  the  leave-one-out  cross-validation  method,  in  which  the  sum 
of  squares  error  (SSE) 


N 

SSE{h)  =  YJU{xvh)-Fjf  (4.5) 

3= 1 

is  calculated  by  omitting,  in  turn,  each  design  vector  Xj  in  Equation  (4.4).  The  resulting 
function  SSE  can  then  be  optimized  separately  to  determine  an  appropriate  bandwidth 
setting  for  the  parameter  h. 

4.1.2  Kriging 

Similar  to  kernel  regression,  Kriging  uses  the  smoothness  assumption  of  the  objective 
function  to  estimate  responses  under  the  context  that  the  closer  the  inputs  are,  the  more 
positively  correlated  are  their  outputs  (van  Beers  and  Kleijnen  [61]).  Although  the  over¬ 
all  assumption  of  the  surface  is  the  same,  the  generation  of  estimation  functions  is  quite 
different.  Not  only  is  Kriging  a  parametric  method,  but  it  is  also  an  exact  interpolator , 
in  that  it  produces  predicted  values  at  observed  input  values  that  are  exactly  equal  to  the 
simulated  output  values. 

Although  Kriging  was  originally  developed  as  an  analysis  method  for  mining  applica¬ 
tions  (see  Krige  [32]),  it  has  become  a  popular  multidisciplinary  method  of  estimation  (see 
Currin  et  al.  [15]).  Not  only  have  Kriging  interpolation  functions  been  shown  to  provide 
better  fitting  models  in  multi-dimensional  domains  than  polynomial  interpolants,  they  of¬ 
ten  exhibit  fewer  oscillations,  which  can  cause  polynomial  fitting  techniques  to  fail. 

As  in  kernel  regression,  Kriging  involves  the  approximation  of  the  mean  response  curve 
/  in  the  regression  relationship 


71 


Y(Xi)  =  f(Xi)+e(Xi) 


(4.6) 


where  i  =  1, n,  the  Xt  are  design  vectors,  Y-i  are  the  averaged  responses  of  the  associated 
Xi,  and  e  is  the  random  error.  However,  while  kernel  regression  treats  each  error  term  as 
an  independent  variable,  Kriging  considers  each  error  to  be  a  dependent  random  variable 
with  a  known  distribution  and  zero  mean.  By  assuming  the  random  error  follows  a  known 
distribution,  Kriging  is  able  to  model  the  covariance  of  the  error  function.  The  covariance 
is  modeled  as  a  second-order  stationary  process;  i.e.,  the  means  and  variances  are  constants 
and  the  covariances  of  the  outputs  depend  only  on  the  distance  between  the  inputs.  For 
example,  a  general  covariance  function  can  be  given  as 


cr2f  exp (—6  ||xj  —  Xj  ||) 


CovifiX^JiXj)] 


(4.7) 


where  a2  is  a  scalar  process  variance,  i  and  j  =  1  with  i  /  j,  and  0  is  a  weighting 
factor  for  the  influence  of  surrounding  data  points. 

Because  Kriging  is  based  on  a  regression  model,  Equation  (4.6)  can  be  more  explicitly 
expressed  as 


(4.8) 


where  fj  are  basis  functions  and  /3j  are  the  corresponding  coefficients.  The  use  of  basis 
functions  allows  Kriging  to  assume  various  polynomial  forms  which  provide  more  flexibility 
than  the  default  linear  function.  Thus,  if  information  on  the  underlying  response  surface 
shape  is  known,  then  the  Kriging  model  can  be  tailored  to  take  advantage  of  this  additional 
knowledge. 
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Although  Kriging  allows  great  flexibility  in  the  selection  of  the  estimation  function,  the 
model  parameters  have  to  be  determined  before  the  Kriging  model  can  be  used  to  produce 
response  estimates.  Similar  to  kernel  regression,  these  parameters  are  also  optimized 
separately. 

4-2  Surrogate  Sampling  Design 

To  garner  useful  information  from  the  surrogate  models,  a  structured  approach  must 
be  taken  in  the  selection  of  points  which  are  used  in  the  building  of  the  surrogates.  The 
following  subsections  discuss  the  methods  employed  in  this  research  to  ensure  the  surrogates 
properly  balance  the  benefit  of  the  additional  information  provided  with  the  increase  of 
function  evaluations  incurred  by  the  optimization  algorithm. 

4-2.1  Latin  Hypercube  Sampling 

In  order  to  provide  a  good  global  search  of  the  feasible  region,  surrogate  models  require 
response  data  over  the  entire  feasible  region  to  properly  approximate  the  underlying  surface. 
A  simple  and  popular  experimental  design  technique  to  provide  such  data  is  Latin  hypercube 
sampling  (LHS).  Although  LHS  was  invented  for  deterministic  simulation  models  (Mckay 
et  al.  [40]),  it  is  a  useful  tool  for  generating  a  space-filling  set  of  points  from  which  to 
construct  an  initial  surrogate. 

In  traditional  LHS,  each  variable  domain  of  the  feasible  region  is  divided  into  n  intervals 
of  equal  length.  Then,  design  points  are  randomly  placed  within  each  interval  so  that,  for 
each  variable,  a  design  point  appears  exactly  once  in  each  interval.  This  is  convenient  for 
pattern  search,  since  the  feasible  region  is  already  equally  divided  by  the  mesh,  and  thus 
the  design  points  must  simply  be  distributed  properly.  If  the  number  of  desired  design 
points  p  equals  the  number  of  intervals  n  of  the  LHS  design,  then  the  design  is  said  to  be 
of  strength  one. 


73 


Since  the  global  quality  of  the  surrogate  is  initially  determined  by  the  LHS,  designs  of 
strength  two  ( p  =  2 n)  are  used  in  this  research  to  provide  an  initial  search  of  the  feasible 
region  that  is  denser  than  a  traditional  strength  one  design  (p  =  n).  Figure  4.1  illustrates 
LHS  samples  of  strengths  one  and  two  for  a  two-dimensional  design  space.  Since  locating 
and  moving  to  areas  of  possible  improvement  is  important  early  on  in  the  algorithm,  the 
first  SEARCH  step  performed  in  this  research  will  always  be  LHS. 


*1 

Strength  2 

Figure  4.1.  Examples  of  Latin  Hypercube  Samples  of  Strengths  1  and  2. 

4-2.2  The  Merit  Function 

Once  the  surrogate  function  is  built,  it  can  be  utilized  within  the  pattern  search  frame¬ 
work  as  an  inexpensive  means  to  generate  candidate  points  from  the  mesh  M\~.  After  the 
candidate  points  are  evaluated,  the  points  are  added  as  design  sites  for  the  surrogate  to 
enhance  the  accuracy  of  the  surrogate  function.  A  straightforward  rule  for  selecting  can¬ 
didate  points  is  simply  to  minimize  /  on  the  mesh  directly.  However,  using  this  greedy 
approach  may  inhibit  the  overall  accuracy  of  the  surrogate  function  because  the  trial  points 
tend  to  cluster  in  a  particular  region  of  the  design  space  (Sriver  [55]). 


Strength  1 
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Instead  of  directly  estimating  the  response  by  the  estimator  function  f(x),  Sriver 
incorporated  a  bi-objective  merit  function  (see  Torczon  and  Trosset  [59])  of  the  form 

m(x)  =  f(x)  —  A  d(x)  (4.9) 

where  d(x)  =  min  ||x  —  Xj\\  is  the  distance  from  x  to  the  nearest  previously  sampled  design 
j 

site  and  A  >  0  determines  the  relative  weight  placed  on  the  space-filling  objective  for  the 
selection  of  candidate  points.  By  initially  setting  the  A  parameter  as  a  multiple  of  the  mean 
difference  between  initial  design  responses  and  decaying  the  parameter  after  each  SEARCH 
step,  the  merit  function  approach  forces  the  searches  to  perform  an  early  examination 
of  space-filling  points.  The  impact  of  the  merit  function  approach  will  be  examined  in 
computational  testing. 

4-2.3  The  Trust  Region 

An  additional  embellishment  on  the  standard  MGPS-RS  algorithm  that  Sriver  [55] 
performed  was  the  use  of  a  trust  region  of  the  surrogate  model.  Since  kernel  regression 
methods  are  interpolatory,  estimates  of  points  lying  outside  the  sampling  region  approach 
a  value  equal  to  the  mean  response  of  the  nearest  design  site.  Thus,  the  combination  of  a 
kernel  regression  estimator  with  a  large  bandwidth  and  the  merit  function  approach  would 
constantly  force  the  selection  of  points  that  were  distant  from  the  current  design  points. 

To  prevent  this  occurrence,  Sriver  incorporated  a  "trust  region"  and  set  its  radius 
to  one-half  of  the  maximum  Euclidean  distance  between  any  pair  of  initial  design  sites. 
During  the  SEARCH  step,  the  search  is  restricted  to  a  ball  centered  at  the  starting  point 
with  the  prescribed  radius.  The  impact  of  this  option  on  the  quality  of  the  surrogate  will 
also  be  examined  in  computational  testing. 
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4-3  Implementation  Considerations 

In  addition  to  the  general  considerations  presented  in  Section  3.6,  the  use  of  NOMADrn 
as  an  implementation  of  the  modified  algorithm  introduces  specific  encoding  considerations. 
This  section  discusses  the  specific  issues  presented  by  and  resulting  decisions  made  from 
the  use  of  NOMADrn. 

In  order  to  allow  the  user  flexibility  during  the  local  neighborhood  search,  NOMADrn 
offers  a  wide  range  of  direction  sets,  including  an  option  of  a  user-defined  set,  to  be  used  in 
the  generation  of  the  POLL  set.  This  research  considered  the  use  of  the  standard  maximal 
set,  given  by  [7;  —7],  to  ensure  a  rich  local  search  was  performed. 

As  noted  by  Dennis  and  Schnabel  [19],  an  important  consideration  in  many  engineering 
problems  is  the  issue  of  variable  scaling.  Scaling  refers  to  the  transformation  of  variables 
so  that  they  will  have  approximately  the  same  range.  For  badly  scaled  problems,  the 
disparate  ranges  of  the  variables  have  the  effect  of  assigning  unequal  weighting  factors  to 
the  problem  variables.  In  extreme  cases,  the  altering  of  problem  variable  importance  can 
cause  variables  to  be  virtually  ignored  by  the  optimization  problem.  In  order  to  maintain 
the  relative  importance  of  variables,  NOMADrn  provides  the  option  of  logarithmic  scaling 
of  directions,  rather  than  direct  variable  transformations.  By  calculating  the  weighting 
factors  for  each  variable,  the  directions  of  the  POLL  set  can  be  properly  rescaled  within 
each  variable  range.  The  result  of  logarithmic  rescaling  is  that  the  descent  directions  will 
take  into  consideration  the  ranges  of  the  variables.  In  this  research,  a  base-2  logarithm 
transformation  was  used  since  it  will  rescale  the  variables  more  frequently  than  a  base- 10 
logarithm. 

In  the  presence  of  constraints,  consideration  must  be  given  to  the  evaluation  of  infea¬ 
sible  points.  Since  NOMADrn  was  designed  to  handle  nonlinear  constraints  through  the 
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use  of  a  filter,  the  code  already  determines  if  potential  iterates  will  lie  within  the  defined 
constraints.  Since  infeasible  points  may  improve  the  accuracy  of  the  surrogate,  infeasible 
points  were  evaluated  but  not  allowed  to  replace  the  incumbent.  Therefore,  when  the  POLL 
set  contains  directions  that  would  make  a  constraint  active,  directions  which  conform  to 
the  boundary  were  added  to  the  POLL  set  and  the  entire  POLL  set  was  evaluated.  Addition¬ 
ally,  when  stochastic  simulations  are  used,  the  user-defined  number  of  initial  replications, 
so  from  Equation  (3.11),  of  infeasible  points  was  taken.  Since  the  goal  of  the  optimization 
is  to  locate  an  improved  feasible  solution,  the  user-defined  minimal  number  of  replications 
was  used  for  infeasible  points  to  provide  some  variance  reduction  while  preserving  function 
evaluations  for  the  overall  algorithm. 

When  surrogates  are  used  during  the  SEARCH  step,  the  user  must  decide  which  iterates 
should  be  included  in  the  construction  of  the  surrogate.  Although  the  inclusion  of  more 
(or  all)  design  points  may  improve  the  accuracy  of  the  surrogate,  constructing  and/or 
evaluating  a  surrogate  with  a  large  number  of  design  points  can  become  prohibitive  in 
terms  of  computing  speed  or  available  memory.  In  an  effort  to  balance  these  two  concerns, 
only  search  points  and  improved  mesh  points  were  used  during  the  surrogate  construction; 
all  other  previously  sampled  points  are  ignored. 

Finally,  for  mixed  variable  problems,  MGPS  typically  searches  only  those  sections  of 
the  mesh  associated  with  the  categorical  setting  containing  the  incumbent.  However,  the 
flexibility  of  the  SEARCH  step  allows  the  search  of  any  section  of  the  mesh.  For  exam¬ 
ple,  for  each  possible  categorical  setting,  a  surrogate  can  be  built  during  initialization  in 
preparation  for  future  searches,  as  in  Sriver’s  implementation  [55] .  Since  this  may  require 
a  significant  number  of  function  evaluations  for  MVPs  with  many  possible  categorical  set¬ 
ting  combinations,  an  alternative  approach  is  to  build  surrogates  only  as  they  are  needed. 
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In  this  research,  the  result  of  the  POLL  step  is  used  in  determining  which  of  the  discrete 
neighbors  are  to  be  polled.  The  surrogates  are  only  constructed  for  sections  of  the  mesh 
that  are  associated  with  the  categorical  settings  of  discrete  neighbors  that  trigger  extended 
polling.  Thus,  surrogates  are  constructed  only  for  the  sections  of  the  mesh  in  which  there 
is  evidence  that  an  improved  mesh  point  may  lie. 

4-4  Design  of  Investigation 

Sequential  Selection  with  Memory  and  the  modified  MGPS  algorithm  were  imple¬ 
mented  in  NOMADrn  to  extend  the  code  to  stochastic  simulation  problems  and  to  improve 
the  computational  efficiency  of  the  underlying  algorithm.  A  computational  evaluation 
was  conducted  using  the  new  code  to  assess  the  performance  of  the  various  combinations 
of  surrogate  strategies  presented  in  this  chapter.  This  evaluation  consisted  of  a  series  of 
experiments  applied  to  three  standardized  test  problems  from  Schittkowski  [52] . 

Since  the  modified  algorithm  is  a  generalization  of  the  MGPS-RS  and  the  convergence 
theory  is  not  changed,  the  focus  of  the  computational  testing  is  the  use  of  standardized  tests 
to  compare  surrogate  designs.  The  objective  of  this  comparative  testing  is  to  determine  if, 
under  similar  termination  criteria,  the  use  of  surrogates  can  be  shown  to  provide  a  reduction 
in  the  number  of  function  evaluations  and  an  improved  solution,  where  the  quality  of  a 
solution  is  measured  by  its  proximity  to  the  optimal  solution  and  the  true  response  value 
at  the  terminating  point.  In  order  to  ensure  that  the  comparative  results  were  based  on 
non-confounded  factors,  a  full  factorial  design  was  used  in  this  testing  where  the  factors 
evaluated  included  the  type  of  surrogate  (either  Nadaraya- Watson  or  Kriging),  the  use  of  a 
merit  function,  and  the  use  of  a  trust  region.  After  encoding  and  running  the  deterministic 
test  problems,  stochastic  versions  were  constructed  for  testing  by  imbuing  the  response  with 
a  noise  signal  that  follows  a  standard  normal  distribution,  which  can  be  represented  as 
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F(x)~f(x)  +  N(  0,1) 


(4.10) 


where  F(x)  is  the  response  of  the  simulation  and  f(x)  is  the  noise- free  response  of  the 
system. 

In  order  to  represent  a  cross-section  of  standard  test  problems,  the  following  three 
nonlinear  unconstrained  problems  were  selected. 

Problem  1  (Powell  function) 

General  Polynomial:  f(x)  =  (x\  +  IOX2)2  +  5(x3  —  X4)2  +  (X2  —  2X3)4  +  10(xi  —  X4)4 
with  an  initial  point  xq  =  (3,  — 1,  0, 1)T, 
initial  objective  function  value  f(x 0)  =  235, 
optimal  point  x*  =  (0,  0, 0, 0)T, 
and  optimal  objective  function  value  f(x*)  =  0 

Problem  2 

Quadratic:  f(x)  =  xTQx  where  Qxj  =  j+j_1 ,  i,j  =  1,2, 3, 4  (4x4  Hilbert  Matrix) 
with  an  initial  point:  x  =  (—4,  —2,  —1.333,  —  1)T, 
initial  objective  function  value  f(x 0)  =  33.965, 
optimal  point  x*  =  (0,  0, 0, 0)T, 
and  optimal  objective  function  value  f(x*)  =  0 

Problem  3  (Rosenbrock  function) 

Sum  of  Squares:  f(x)  =  100(x2  —  x\)2  +  (1  —  aq)2  +  100(x3  —  x|)2  +  (1  —  X2)2 

+100(x4  -  x|)2  +  (1  -  x3)2 
with  an  initial  point:  x  =  (—1-2, 1,  —1.2, 1)T, 
initial  objective  function  value  f(x 0)  =  532.4, 
optimal  point  x*  =  (1,1,1, 1)T, 
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and  optimal  objective  function  value  f(x*)  =  0 


4-4-1  Deterministic  Runs 

To  characterize  the  general  impact  the  use  of  the  selected  surrogates  would  have  on  the 
test  problems,  the  results  of  using  no  surrogates  were  compared  to  each  of  the  surrogates  of 
this  study  (Nadaraya- Watson  and  Kriging  estimators)  without  the  use  of  a  trust  region  or 
merit  function  approach.  For  each  of  the  runs,  the  noise  was  removed  from  the  response 
so  that  any  difference  found  would  be  attributable  to  the  surrogate  used.  Although  the 
solution  becomes  deterministic  in  the  no-surrogate  case,  the  use  of  the  initial  LHS  procedure 
produces  randomness  in  the  results  when  surrogates  are  employed.  Thus,  in  addition  to 
the  use  of  a  terminating  criteria  of  either  5000  function  evaluations  or  <  0.0001,  100 

replications  of  each  test  problems  were  taken  whenever  the  Nadaraya- Watson  or  Kriging 
estimators  were  used. 

The  results  of  these  test  runs  show  a  mild  improvement  when  surrogates  are  used. 
Specifically,  reductions  were  observed  in  the  distance  from  the  true  minimizer  and  in  true 
objective  function  value  of  the  final  iteration.  The  number  of  function  evaluations  was 
also  lower  when  surrogates  were  used.  Replications  corresponding  to  surrogate  runs  with  a 
number  of  function  evaluations  less  than  the  median  are  correlated  with  runs  that  provide 
reductions  in  the  norrned  distance  and  true  function  evaluation.  Therefore,  the  surrogates 
provide  an  overall  improvement  when  used  on  these  test  problems.  Details  are  provided 
in  Appendix  A.l. 

4-4-%  Pilot  Study 

Since  the  optimization  algorithm  is  dependent  on  appropriately  selected  R&S  parame¬ 
ters  to  produce  useful  estimates  of  the  true  surface  response,  a  pilot  study  was  performed 
on  each  of  the  problems  to  determine  the  R&S  parameters  used  in  the  main  test  runs.  The 
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initial  estimates  for  the  proper  R&S  parameters  are  taken  from  the  work  of  Sriver  in  [55] . 
For  the  pilot  study,  the  number  of  initial  samples  was  fixed  at  5  and  the  decay  rate  for  both 
the  alpha  and  indifference  were  varied  identically  starting  at  0.95.  The  other  parameters 
investigated  for  change  were  the  initial  alpha  and  indifference  zone  parameters  which  were 
0.8  and  100,  respectively.  The  pilot  study  considered  alternative  settings  for  alpha  of  0.70, 
indifference  zone  of  80,  and  decay  parameters  of  0.90. 

By  conducting  a  pilot  study,  common  R&S  parameter  settings  for  the  problems  tested 
could  be  found  that  provide  an  appropriate  baseline  convergence  to  the  known  optimal 
point,  where  the  baseline  refers  to  the  implementation  without  surrogates.  Establishing 
a  baseline,  the  relative  worth  of  using  a  particular  surrogate  can  be  directly  judged  in 
comparative  analysis  with  the  runs  without  surrogates.  Thus,  the  full  factorial  pilot  study 
was  run  without  the  use  of  surrogates. 

Using  one  hundred  replications  of  each  design  setting,  the  quality  of  the  R&S  parame¬ 
ter  settings  was  judged  by  the  distance  from  the  terminating  solution  to  the  true  known 
optimum.  One  should  note  that  the  quality  of  the  solution  was  not  judged  by  the  result¬ 
ing  response  at  the  terminating  solution  as  the  response  itself  is  stochastic  and  is  related 
to  the  distance  to  the  optimum.  The  pilot  study  used  terminating  criteria  of  either  5000 
function  evaluations  or  A*,  <  0.0001.  Since  the  underlying  algorithm  produced  stochastic 
results  at  each  iteration,  shown  in  Appendix  A. 2,  the  resulting  distribution  of  the  termi¬ 
nating  solution  was  of  an  unknown  analytic  form. 

The  pilot  run  results  were  initially  analyzed  by  graphical  tools  to  provide  a  basic  char¬ 
acterization  of  the  underlying  probability  distributions.  Since  most  standard  statistical 
tools  for  comparing  distributions  rely  on  underlying  normal  distributions,  an  initial  charac¬ 
terization  of  the  distributional  form  is  important  in  justifying  the  use  of  parametric  tools. 
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If  the  distributions  are  not  normal  or  cannot  justifiably  be  transformed  to  a  normal  distri¬ 
bution,  then  the  distribution  must  be  compared  using  nonparametric  tools,  many  of  which 
have  the  same  power  as  parametric  methods.  The  outlier  box  plot  and  quantile  box  plot  of 
the  distributions  shown  in  Appendix  A. 2  provide  sufficient  evidence  that  the  distributions 
are  not  normal;  thus,  nonparametric  methods  were  used. 

For  the  pilot  study  results,  the  van  der  Waerden  test  was  performed.  This  test  is  used 
to  detect  whether  at  least  two  of  k  sample  populations  come  from  different  distributions  (see 
Sheskin  [53]).  By  transforming  the  rank-orders  into  a  set  of  standard  deviations  derived 
from  the  standard  normal  distribution,  the  van  der  Waerden  test  provides  statistical  power 
generally  equal  to  that  of  the  analogous  parametric  test.  The  null  hypothesis  is  that  the 
samples  are  derived  from  the  same  underlying  distribution,  and  its  rejection,  based  on  the 
given  level  of  significance,  indicates  that  there  is  enough  statistical  evidence  to  conclude 
that  the  different  R&S  parameters  produce  different  underlying  distributions.  Thus,  even 
if  transformations  were  used  to  normalize  the  distributions,  the  R&S  parameters  impact 
the  quality  of  the  terminating  solution.  Additionally,  the  quality  of  the  parameter  settings 
was  measured  by  the  median,  as  opposed  to  the  mean.  Because  each  distribution  may 
contain  extreme  values,  as  indicated  by  the  outlier  box  plots,  the  mean  of  the  distribution 
can  be  skewed.  Since  the  median  is  not  influenced  by  extreme  observations,  it  was  chosen 
as  a  more  appropriate  measure  of  central  tendency. 

The  results  of  the  pilot  runs,  shown  in  Appendix  A. 2,  reference  the  R&S  parameter 
design  by  concatenating  the  indifference  zone,  the  indifference  rate  of  decay,  the  alpha  level, 
and  the  alpha  rate  of  decay.  Thus,  the  R&S  settings  suggested  by  Sriver  can  be  represented 
as  100958095  for  an  indifference  zone  of  100,  alpha  level  of  0.80,  and  decay  rates  of  0.95. 
From  the  results,  R&S  parameter  settings  of  100958095,  100908090,  and  75907090  tended 
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to  produce  favorable  distance  results.  Therefore,  these  initial  conditions  settings  were  used 
in  the  computational  testing  of  the  main  run. 

4-4-3  Main  Runs 

Since  the  purpose  of  the  computational  testing  is  to  find  if,  starting  under  similar  con¬ 
ditions,  the  use  of  surrogates  can  be  shown  to  provide  an  improved  solution,  the  baseline 
of  no  surrogates  is  accompanied  by  each  of  the  surrogate  design  combinations  for  the  main 
run.  The  quality  of  solution  is  judged  by  both  the  proximity  to  the  optimal  solution  and 
the  true  objective  function  value  at  termination,  where  the  main  run  used  terminating  cri¬ 
teria  of  either  100,000  function  evaluations  or  A*.  <  0.0001.  Just  as  in  the  pilot  run,  box 
plots  of  the  distributions  are  used  to  initially  characterize  each  of  the  distributions.  The 
comparative  analysis  is  then  performed  both  on  the  grouped  results  of  all  three  R&S  pa¬ 
rameter  settings  and  for  each  R&S  parameter  individually.  When  the  box  plots  for  two 
or  more  distributions  indicate  that  there  may  be  similarity  between  the  distributions,  ei¬ 
ther  the  Wilcoxon  rank-sum  procedure  or  the  Kruskal- Wallis  procedure  is  performed.  The 
Wilcoxon  rank-sum  procedure  tests  the  hypothesis  that  the  two  sample  populations  are 
different  and  the  Kruskal- Wallis  procedure  can  be  considered  an  extension  of  the  Wilcoxon 
procedure  that  tests  whether  at  least  two  of  k  sample  populations  are  different.  These 
procedures  test  the  null  hypothesis  of  equal  distributions  through  the  comparison  of  trans¬ 
formed  rank  values  for  each  distribution.  Again,  the  median  is  used  in  measuring  the 
overall  quality  of  the  solution  for  each  of  the  distributions. 

4  ■  5  Computational  Results 

The  results  of  the  main  runs  are  shown  in  Appendix  A. 3,  where  the  abbreviations  NW 
and  Krig  are  used  to  identify  the  use  of  the  Nadaraya- Watson  and  Kriging  surrogates  and 
the  use  of  a  local  trust  region  and  a  merit  function  are  represented  by  a  boolean  indicator; 
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e.g.,  KrigLoclMerO  represents  the  use  of  a  Kriging  surrogate  with  a  local  trust  region  but 
not  a  merit  function.  Unlike  the  deterministic  runs,  the  results  indicate  that  the  use  of 
surrogates  in  the  present  implementation  does  not  provide  improvement  in  convergence  to  a 
local  optimum  and  the  number  of  function  evaluations  does  not  indicate  that  the  surrogates 
would  provide  an  overall  improvement  to  the  terminating  solution.  In  fact,  the  number 
of  function  evaluations  for  the  surrogate  runs  exceeded  the  no-surrogate  runs  without  any 
indication  of  an  improved  response.  However,  the  reason  for  this  failure  and  the  differences 
in  competing  surrogate  designs’  performances  provide  enough  information  to  warrant  some 
general  discussion. 

4-5.1  Main  Run  Results 

At  first,  failure  of  the  surrogates  to  provide  an  improved  response,  as  demonstrated  by 
the  deterministic  runs,  was  unexpected.  Since  the  surrogates  actively  use  prior  information 
from  the  cache  to  generate  trial  points,  it  was  assumed  that  the  surrogate  runs  would 
demonstrate  an  improved  terminating  solution  and  a  reduction  in  the  number  of  function 
evaluations.  However,  the  use  of  decaying  R&S  parameters  have  the  affect  of  improving 
the  cache  estimates  over  time  and,  other  than  the  incumbent,  are  never  updated  after  the 
initial  sampling.  Thus,  the  surrogates  are  autocorrelated  with  their  occurrence  within  the 
cache.  As  such,  the  surrogate  is  constructed  from  points  with  various  levels  of  confidence 
due  to  the  different  sampling  variances.  The  resulting  surrogate  is  such  a  poor  fit  for  the 
true  surface  that  the  generation  of  trial  points  decreases  the  computational  efficiency  of  the 
algorithm. 

The  results  also  indicate  that  the  use  of  a  trust  region  and  a  merit  function  can  impede 
the  algorithmic  progress  to  a  stationary  point  when  there  is  a  single  optimum.  The  sur¬ 
rogate  design  using  the  Kriging  estimator  with  a  combined  trust  region  and  merit  function 
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approach  (KrigLoclMerl)  provided  the  worst  solution  across  each  of  the  test  problems,  and 
the  surrogate  design  using  the  Nadaraya- Watson  estimator  with  a  combined  trust  region 
and  merit  function  approach  (NWLoclMerl)  frequently  performed  worse  than  the  other 
Nadaraya- Watson  based  designs.  Since  the  use  of  a  merit  function  forces  the  search  to  be¬ 
come  more  space-filling,  the  associated  test  runs  perform  a  more  global  search  than  other 
test  runs.  When  multiple  local  optima  are  present,  the  purpose  of  the  optimization  may 
be  to  find  an  area  that  simply  improves  the  current  solution;  thus  a  global  search  may 
be  appropriate.  However,  in  this  research,  each  of  the  test  functions  contained  a  single 
global  optimum.  By  rigorously  searching  the  feasible  region,  the  combined  trust  region 
and  merit  function  approach  requires  additional  function  evaluations  to  ensure  alternative 
areas  of  improvement  were  not  overlooked  during  the  convergence  to  a  stationary  point. 
Thus,  when  compared  to  test  runs  that  perform  more  local  searches,  searches  that  used  the 
combined  trust  region  and  merit  function  approach  performed  poorly.  However,  if  the  test 
problems  had  contained  many  local  optima,  it  is  likely  that  the  combined  approach  would 
have  performed  well. 

Additionally,  there  appears  to  be  a  strong  correlation  between  competing  sets  of  sur¬ 
rogate  designs.  In  each  of  the  test  problems,  when  a  local  trust  region  was  not  used, 
the  Nadaraya- Watson  estimator  performed  similarly  regardless  if  the  merit  function  was 
employed  or  not.  However,  when  the  Kriging  estimator  was  employed,  the  terminating 
solutions  for  designs  where  a  merit  function  was  not  employed  were  similar  to  each  other, 
regardless  if  the  local  merit  region  was  used.  Since  this  similarity  was  apparent  from  the 
box  plots,  both  of  these  cases  were  statistically  tested  for  equivalence  of  underlying  dis¬ 
tributional  form.  As  shown  in  Appendix  A. 3,  there  is  not  enough  statistical  evidence  to 
reject  the  null  hypothesis  of  equivalent  underlying  distributions  for  these  surrogate  designs. 
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These  observations  demonstrate  the  interpolatory  nature  of  each  of  these  surrogates.  Since 
the  Gaussian  kernel  used  for  the  Nadaraya- Watson  has  long  tails,  the  estimates  outside  the 
sampling  region  may  have  been  tending  to  distant  points.  Also,  since  the  optimal  was  cen¬ 
trally  located  within  the  domain  of  the  problem,  the  Kriging  estimator  may  have  ensured 
that  a  local  trust  region  was  maintained  by  modeling  the  covariance  as  Gaussian  distribu¬ 
tion. 

Computational  testing  demonstrated  that  using  a  cache  of  mean  responses,  calculated 
from  previously  evaluated  iterate  responses,  can  have  a  large  influence  on  the  quality  of  the 
terminating  solution  and  the  number  of  function  evaluations.  As  a  general  result  of  this 
research,  in  cases  where  the  quality  of  iterate  response  estimation  improves  as  the  algorithm 
progresses,  the  use  of  previous  iterate  responses  from  a  cache  can  negatively  impact  the 
performance  and  terminating  solution  of  an  optimization  algorithm.  In  this  research,  the 
use  of  previously  evaluated  iterates  impacts  both  the  R&S  procedure  used  for  solving  the 
iterate  selection  subproblem  and  the  quality  of  the  surrogate  used  for  locating  areas  of 
improved  response.  To  demonstrate  that  the  use  of  the  cache  negatively  impacted  the 
algorithm,  additional  testing  was  performed  by  modifying  the  R&S  procedure  as  discussed 
in  the  following  section. 

4-5.2  R&S  Modification  Results 

The  unmodified  R&S  procedure  used  for  the  main  runs  maintained  previously  stored 
values  for  the  current  incumbent  for  future  R&S  comparisons.  Since  the  quality  of  iter¬ 
ate  estimation  improves  asymptotically,  an  initial  bias  of  low  functional  responses  are  built 
into  the  cache  of  the  incumbent.  If  the  initial  estimates  for  the  incumbent  are  significantly 
below  the  true  response  value,  the  incumbent  may  not  be  replaced  by  an  iterate  whose  true 
response  is  lower  than  the  incumbent.  Thus,  the  algorithm  is  likely  to  produce  poor  terrni- 
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nating  solutions.  In  order  to  address  the  issue,  the  memory  portion  (the  storage  of  previ¬ 
ous  response  evaluations  for  the  incumbent)  was  removed  from  the  R&S  procedure  and  the 
main  runs  were  redone  with  the  modified  code.  The  removal  of  the  incumbent  memory  for 
the  R&S  procedure  enables  more  accurate  response  estimates  of  the  incumbent  by  requiring 
it  to  be  re-estimated  with  the  current  R&S  parameter  setting.  Thus,  response  estimation 
of  the  incumbent  is  improved  by  requiring  additional  function  evaluations.  From  the  com¬ 
parison  shown  in  Appendix  A. 4,  the  removal  of  former  incumbent  response  data  improved 
both  observed  and  true  responses  for  the  terminating  solution.  Since  the  underlying  al¬ 
gorithm  is  dependent  on  accurate  incumbent  information,  bias  in  the  sample  estimation  of 
the  incumbent  not  only  reduces  the  accuracy  of  the  incumbent  response,  but  also  reduces 
the  quality  of  the  terminating  solution.  By  using  the  modified  R&S  procedure,  the  termi¬ 
nating  solutions,  shown  in  Appendix  A. 5,  demonstrate  improvement  over  the  unmodified 
R&S  procedure  results  of  Appendix  A. 3.  Therefore,  the  initial  savings  in  functional  evalu¬ 
ations  provided  by  the  storage  of  the  incumbent  value  has  been  shown  to  be  an  ineffective 
means  of  improving  the  terminating  solution  for  stochastic  responses  for  the  optimization 
algorithm  used  in  this  research. 

The  use  of  previously  evaluated  iterates  also  impacts  the  quality  of  the  surrogate  used 
during  a  search.  Although  the  surrogates  are  constructed  during  each  SEARCH  step,  which 
has  the  affect  of  recalibrating  the  surrogate  prior  to  its  use,  the  design  iterates  used  to 
build  the  surrogate  are  not  recalibrated  to  ensure  homogeneity  of  the  sample  variances. 
As  noted  in  Section  3.6,  since  the  R&S  parameters  are  initially  set  loosely  and  are  decayed 
as  the  algorithm  progresses,  responses  selected  from  the  cache  may  be  inaccurate  and 
can  have  large  differences  in  sample  variances.  Thus,  the  surrogate  may  provide  a  poor 
fit  to  the  true  underlying  surface.  As  a  result,  the  mild  improvement  through  the  use 
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of  surrogates  suggested  by  the  deterministic  test  problems  runs  is  never  obtained  by  the 
stochastic  versions  of  the  test  problems. 

4 . 6  Summary 

This  chapter  demonstrated  the  applicability  of  the  optimization  algorithm  presented 
in  this  research  under  various  surrogate  designs  by  comparing  the  quality  of  terminating 
solutions  and  number  of  function  evaluations  on  a  set  of  standard  test  problems.  By  an¬ 
alyzing  the  results  of  the  computational  runs,  some  general  conclusions  about  the  experi¬ 
ment  design  and  the  modified  algorithm  could  be  made.  Issues  raised  by  the  analysis  are 
discussed  in  the  next  chapter  in  the  context  of  future  work 


Chapter  5  -  Conclusions  and  Recommendations 

This  research  effort,  relating  a  generalized  pattern  search  for  mixed  variable  domains 
and  ranking  and  selection  procedures,  provides  an  algorithm  that  can  efficiently  solve 
simulation-based  optimization  problems  of  "black  box"  systems.  Through  modification 
of  the  underlying  MGPS  algorithm,  NOMADrn  provides  enhanced  support  of  surrogates 
and  is  now  applicable  to  the  optimization  of  mixed  variable  stochastic  simulation  problems 
with  linear  constraints. 

5.1  Future  Research 

Comparative  testing  performed  in  this  research  has  shown  the  relative  importance  of 
the  R&S  parameters,  the  selection  of  surrogates,  and  the  surrogate  approaches.  As  with 
many  research  efforts,  the  observations  open  the  door  for  more  discoveries  in  the  use  of 
surrogates,  with  and  without  R&S  procedures,  for  simulation-based  optimization.  The 
following  sections  present  recommendations  for  further  research  that  could  serve  to  enhance 
the  performance  of  mixed  variable  programming  problem  solution  techniques. 

5.1.1  Nonlinear  Constraints 

As  noted  in  Section  4.3,  the  sampling  rule  for  infeasible  points  is  based  simply  on 
establishing  a  minimal  level  of  confidence  on  the  response.  Since  infeasible  points  are  used 
as  design  points  for  the  construction  of  a  surrogate,  a  moderate  level  of  confidence  should 
be  maintained  for  the  responses,  which  the  use  of  minimal  sampling  may  not  provide. 
However,  NOMADrn  was  originally  designed  for  optimization  of  nonlinearly  constrained 
mixed  variable  problems  through  a  filter  approach.  Since  the  estimation  of  responses  are 
directly  used  as  a  filter  criteria,  a  new  method  for  controlling  the  sampling  error  of  infeasible 
iterates  will  have  to  be  considered  in  order  to  appropriately  apply  NOMADrn  to  nonlinear 
problems. 
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For  filter  algorithms,  the  goal  is  to  minimize  two  functions,  the  objective  /  and  a 
continuous  aggregate  constraint  violation  function  h.  where  h(x)  >  0  if  and  only  if  x  is 
feasible.  A  filter  is  a  set  of  non-dominated  points  with  respect  to  /  and  h  (see  Audet  and 
Dennis  [8]),  and  steps  are  considered  successful  whenever  improvement  in  either  function 
is  found.  By  polling  around  the  least  infeasible  points  in  the  filter  or  the  best  feasible 
point,  the  goal  is  to  improve  either  the  least  infeasible  point  or  the  best  feasible  point. 
If  the  minimizer  lies  on  the  boundary  of  the  feasible  region,  a  subsequence  of  iterates 
that  approach  the  minimizer  can  be  generated  without  requiring  the  polling  directions  to 
conform  to  the  boundary. 

The  filter  approach  can  be  employed  quickly  for  deterministic  simulations  since  the 
parameters  for  each  iterate,  whether  feasible  or  infeasible,  are  known.  However,  the  use  of 
a  filter  for  stochastic  simulation  requires  a  modification  to  the  general  approach  because 
the  objective  function  is  only  available  stochastically  and  is  typically  reduced  by  repeated 
sampling.  A  possible  approach  for  controlling  the  sampling  error  is  the  use  of  the  current 
R&S  parameters  of  the  feasible  iterates  to  drive  the  sampling  requirement  for  infeasible 
points.  By  the  assumption  of  normality  in  the  error  terms,  the  current  alpha  level  used 
in  the  R&S  procedures  for  the  feasible  points  could  be  used  to  determine  the  number 
of  replications  required  of  the  infeasible  points  to  establish  a  given  level  of  significance. 
Additionally,  the  handling  of  stochastic  constraints  might  be  addressed  by  the  use  of  a 
filter. 

Another  approach  to  the  handling  of  nonlinear  constraints  is  to  extend  the  MADS  al¬ 
gorithm  to  stochastic  MVP  problems.  As  noted  in  Section  2.4.2,  MADS  is  a  generalization 
of  GPS  for  handling  nonlinear  constraints  that  provides  stronger  convergence  results  than 
MGPS.  It  does  so  by  generating  a  dense  set  of  mesh  directions  in  the  limit.  Poll  directions 
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are  chosen  at  each  iteration  from  a  ever  increasing  set  of  mesh  directions.  The  develop¬ 
ment  of  such  a  class  of  algorithms  would  allow  the  solution  of  a  previously  nn solvable  class 
of  problem. 

5.1.2  R&S  Modifications 

As  part  of  the  convergence  theory  used  in  MGPS-RS,  Sriver  [55]  demonstrated  that 
the  use  of  decay  parameters  in  the  R&S  procedure  results  in  asymptotic  behavior  of  the  al¬ 
gorithm,  where  the  statistical  significance  of  the  selected  iterate  improves  as  the  algorithm 
progresses.  Therefore,  the  quality  of  visited  solutions  varies  as  the  algorithm  progresses. 
Since  surrogates  are  built  from  past  responses,  the  use  of  a  nonhomogeneous  pool  of  so¬ 
lutions  can  lead  to  a  poor  approximation  of  the  true  response  surface.  A  method  for 
improving  the  surrogates  is  to  resample  on  a  subset  of  previously  visited  design  points  and 
replicate  the  points  until  the  desired  level  of  homogeneity  is  achieved.  The  resampling 
could  be  performed  after  a  fixed  number  of  R&S  procedures  (which  is  equivalent  to  setting 
fixed  alpha  level  thresholds)  or  based  on  the  pooled  sample  variance  of  the  stored  responses 
(resampling  based  on  the  individual  versus  pooled  variance). 

Another  area  of  current  research  involves  balancing  the  cost  of  sampling  with  switching. 
R&S  procedures  that  use  fully  sequential  sampling,  such  as  the  one  used  in  this  research, 
require  repeated  switching  between  different  model  scenarios.  Since  the  computational 
overhead  of  switching  may  be  expensive,  new  methods  are  being  developed  (see  Hong  and 
Nelson  [27])  that  reduce  the  number  of  required  switches  while  maintaining  the  required 
level  of  significance. 

5.1.3  Surrogate  Modifications 

As  shown  in  Equation  (4.6),  the  Kriging  model  can  be  tailored  by  adding  additional 
problem  knowledge  to  the  selection  of  basis  functions.  In  this  research  the  basis  functions 
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were  set  to  be  linear  functions,  since  no  assumptions,  other  than  smoothness,  of  the  true 
surface  were  made.  In  true  engineering  problems  however,  additional  knowledge  of  the 
problem  is  usually  available.  Through  the  use  of  alternative  basis  functions,  the  Kriging 
model  should  be  able  to  more  accurately  approximate  the  surface.  An  area  of  future 
research  is  to  develop  a  Kriging  model  that  can  change  as  the  pattern  search  algorithm 
progresses  as  additional  information  is  acquired. 

5.2  Summary 

Generalized  pattern  search  provides  an  effective  method  to  solve  very  difficult  opti¬ 
mization  problems,  e.g.  mixed  variable  programming.  This  research  has  addressed  the 
computational  efficiency  for  optimizing  simulated  systems.  The  modifications  to  the  mixed 
variable  generalized  pattern  search  were  described  and  implemented  within  the  NOMADrn 
software  package.  The  modified  algorithm  was  tested  using  three  stochastic  test  problems. 
Overall  results  were  unfavorable  due  to  the  poor  fit  of  the  surrogates  and  lead  to  sample 
variance  reduction  as  a  suggestion  for  future  work. 
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APPENDIX  A  -  Supporting  Data  Summary 


A.l  Deterministic  Run  Data  Summary 


This  appendix  provides  the  details  of  the  initial  deterministic  response  comparison 


used  to  establish  the  relative  quality  of  solution  that  surrogates  provide  when  there  is  no 


noise  in  the  response  signal.  For  each  of  the  test  problems,  the  box  plot  and  quantiles  for 


the  performance  measurements  are  presented. 

Test  Problem  1 

Normed  Distance  True  Function  Evaluation 


Surrogate_Design 


Quantiles 

Level  10%  25%  Median  75%  90% 

Krig  0.065581  0.068409  0.070448  0.071872  0.0732 

NW  0.072333  0.072333  0.072333  0.072333  0.072333 

None  0.072333  0.072333  0.072333  0.072333  0.072333 


Surrogate_Design 


Quantiles 

Level  10%  25%  Median  75%  90% 

Krig  0.000018  0.00002  0.000023  0.000025  0.000027 

NW  0.000026  0.000026  0.000026  0.000026  0.000026 

None  0.000026  0.000026  0.000026  0.000026  0.000026 


Cumulative  Distribution  Functions 


93 


Test  Problem  2 


Normed  Distance 


True  Function  Evaluation 


Quantiles 


Level 

10% 

25% 

Krig 

0.983785 

1 .737309 

NW 

0.500934 

1 .599949 

None 

0.839932 

0.839932 

Wmmmmmm 

Median  75%  90% 

3.65214  5.745399  10.06253 
4.562427  7.771205  11.02117 
0.839932  0.839932  0.839932 


Quantiles 


Level 

10% 

25% 

Krig 

0.000099 

0.000303 

NW 

0.000026 

0.000248 

None 

0.000072 

0.000072 

Median  75%  90% 

0.001352  0.00337  0.010395 

0.002095  0.006105  0.012419 
0.000072  0.000072  0.000072 


Cumulative  Distribution  Functions 
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Test  Problem  3 


Normed  Distance 


True  Function  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig 

0.058642 

0.072403 

0.072403 

0.117245 

0.182248 

NW 

0.072102 

0.072102 

0.072102 

0.072102 

0.072102 

None 

0.072102 

0.072102 

0.072102 

0.072102 

0.072102 

Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig 

0.00087 

0.001339 

0.001339 

0.003587 

0.00881  4 

NW 

0.001324 

0.001324 

0.001324 

0.001324 

0.001324 

None 

0.001324 

0.001324 

0.001324 

0.001324 

0.001324 

Cumulative  Distribution  Functions 
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Number  of  Function  Evaluations 


Surrogate_Design 


Krig  MW  None 


Surrogate_Design 


Test  Problem  1 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig 

2230 

2901 

3079 

3311.75 

3442.9 

NW 

3247 

3297 

3297 

3298 

3298 

None 

3285 

3285 

3285 

3285 

3285 

Test  Problem  2 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig 

5000 

5001 

5002.5 

5005 

5006 

NW 

5000 

5001 

5003 

5005 

5006 

None 

5000 

5000 

5000 

5000 

5000 

Test  Problem  3 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig 

802.6 

5000 

5004 

5005 

5005 

NW 

5001 

5001 

5001 

5002 

5002 

None 

5005 

5005 

5005 

5005 

5005 
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A. 2  Pilot  Study  Data  Summary 

This  appendix  provides  the  details  of  the  results  of  the  pilot  study  used  to  establish 
the  ranking  and  selection  parameters  for  the  main  computational  runs.  As  noted  in  Sec¬ 
tion  4.4,  the  distance  to  the  optimal  point  was  used  as  the  quality  measure  for  each  of  the 
R&S  parameter  settings.  For  each  of  the  test  problems,  the  box  plot  and  statistical  test 
are  first  presented  and  supported  by  the  following  individual  R&S  parameter  histograms 
and  descriptive  statistics.  The  labeling  of  each  R&S  procedure  design  is  provided  by  the 
concatenation  of  the  indifference  zone  parameter,  the  indifference  zone  decay  rate,  the  al¬ 
pha  level,  and  the  alpha  decay  parameter,  e.g.  100957095  indicates  an  initial  indifference 
zone  of  100,  alpha  level  of  0.70,  and  common  decay  parameters  of  0.95. 
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Norm 


Test  Problem  1 


Normed  Distance 


Statistical  Evaluation 


Van  der  Waerden  Test  (Normal  Quantiles) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

75907090 

400 

-28.8684 

-0.07217 

-1.546 

75908090 

400 

59.0344 

0.14759 

3.162 

75957095 

400 

1 .4384 

0.00360 

0.077 

75958095 

400 

11.8190 

0.02955 

0.633 

100907090 

400 

40.9612 

0.10240 

2.194 

100908090 

400 

-34.2081 

-0.08552 

-1.832 

100957095 

400 

-10.2777 

-0.02569 

-0.551 

100958095 

400 

-39.8987 

-0.09975 

-2.137 

j  1-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 

22.6091  7  0.0020 
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7TI  i  i  1 

AU  1  1  1 

jl 

tUlq.tfp|l,qH-'Pr-,il,^pnrPTr 

Quantiles  Moments 


100.0% 

maximum 

1 .6002 

Mean 

0.4086984 

99.5% 

1 .6002 

Std  Dev 

0.2838808 

97.5% 

1 .2436 

Std  Err  Mean 

0.014194 

90.0% 

0.7680 

upper  95%  Mean 

0.4366028 

75.0% 

quartile 

0.5325 

lower  95%  Mean 

0.3807939 

50.0% 

median 

0.3421 

N 

400 

25.0% 

quartile 

0.2226 

10.0% 

0.1292 

2.5% 

0.0327 

0.5% 

0.0157 

0.0% 

minimum 

0.0157 

0  .5  1  1 .5 
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Noun 
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jJ 
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.5  1  1 .5 

Quantiles 

M|| 

Moments 

■■■) 

100.0% 

maximum 

1.5170 

Mean 

0.4792899 

99.5% 

1.5170 

Std  Dev 

0.3345129 

97.5% 

1 .3607 

Std  Err  Mean 

0.0167256 

90.0% 

0.9894 

upper  95%  Mean 

0.5121713 

75.0% 

quartile 

0.6215 

lower  95%  Mean 

0.4464085 

50.0% 

median 

0.3655 

N 

400 

25.0% 

quartile 

0.2486 

10.0% 

0.1509 

2.5% 

0.0668 

0.5% 

0.0117 

0.0% 

minimum 

0.0117 
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0 

.5  1  1 .5 

Quantiles 

II 

Moments 

1 

100.0% 

maximum 

1 .4322 

Mean 

0.4289087 

99.5% 

1 .4322 

Std  Dev 

0.2800687 

97.5% 

1.1264 

Std  Err  Mean 

0.0140034 

90.0% 

0.8625 

upper  95%  Mean 

0.4564385 

75.0% 

quartile 

0.5390 

lower  95%  Mean 

0.401379 

50.0% 

median 

0.3906 

N 

400 

25.0% 

quartile 

0.2408 

10.0% 

0.1374 

2.5% 

0.0305 

0.5% 

0.0031 

0.0% 

minimum 

0.0031 
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Quantiles 

Ml 

Moments 

■■HI 

100.0% 

maximum 

1 .2560 

Mean 

0.41 89588 

99.5% 

1 .2560 

Std  Dev 

0.2371007 

97.5% 

0.9904 

Std  Err  Mean 

0.011855 

90.0% 

0.7437 

upper  95%  Mean 

0.4422649 

75.0% 

quartile 

0.5413 

lower  95%  Mean 

0.3956526 

50.0% 

median 

0.3642 

N 

400 

25.0% 

quartile 

0.2713 

10.0% 

0.1495 

2.5% 

0.0747 

0.5% 

0.0110 

0.0% 

minimum 

0.0110 

0  .5  1  1 .5 
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Quantiles 

_ II 

Moments 

1 

100.0% 

maximum 

1 .4942 

Mean 

0.461 7959 

99  5% 

1 .4942 

Std  Dev 

0.3092101 

97.5% 

1 .3058 

Std  Err  Mean 

0.0154605 

90.0% 

0.9040 

upper  95%  Mean 

0.4921901 

75.0% 

quartile 

0.6358 

lower  95%  Mean 

0.4314017 

50.0% 

median 

0.3799 

N 

400 

250% 

quartile 

0.2452 

10.0% 

0.1336 

2.5% 

0.0716 

0.5% 

0  0369 

0  0% 

minimum 

0.0369 
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Noun 


— QEb-  ", 

H  10 

— - H 

jit 

0 

.5  1  1 .5 

I 


Ou.intiles 

II 

Moments 

m 

100.0% 

maximum 

1 .3052 

Mean 

0.3948016 

99.5% 

1 .3052 

Std  Dev 

0.2527534 

97.5% 

1 .2431 

Std  Err  Mean 

0.0126377 

90.0% 

0.6917 

upper  95%  Mean 

0.4196463 

75.0% 

quartile 

0.5064 

lower  95%  Mean 

0.3699569 

50.0% 

median 

0.3261 

N 

400 

25.0% 

quartile 

0.2281 

10.0% 

0.1445 

2.5% 

0.0895 

0.5% 

0.0625 

0.0% 

minimum 

0.0625 

R&S  Design  100957095 


Norm 


—m 

- «f  •>  ? 

M  11)1 

,  i 

1 

Jl 

liJ.fliJ, ,  ,nn  i ,  P . 

1  1.5 
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HI 
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shni 

100.0% 

maximum 

1 .2741 

Mean 

0.4047168 

99.5% 

1 .2741 

Std  Dev 

0.2438108 

97.5% 

1 .0902 

Std  Err  Mean 

0.0121 905 

90.0% 

0.7902 

upper  95%  Mean 

0.4286825 

75.0% 

quartile 

0.4748 

lower  95%  Mean 

0.3807511 

50.0% 

median 

0.3536 

N 

400 

25.0% 

quartile 

0.2468 

10.0% 

0.1614 

2.5% 

0  0786 

0.5% 

0.0625 

0.0% 

minimum 

0.0625 
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Ou.intiles 

HI 

Moments 

^H 

100.0% 

maximum 

1 .5000 

Mean 

0.3887176 

99.5% 

1 .5000 

Std  Dev 

0.2446042 

97.5% 

1.1902 

Std  Err  Mean 

0.0122302 

90.0% 

0.6397 

upper  95%  Mean 

0.4127613 

75.0% 

quartile 

0.4961 

lower  95%  Mean 

0.3646739 

50.0% 

median 

0.3486 

N 

400 

25.0% 

quartile 

0.2479 

10.0% 

0.1256 

2.5% 

0.0626 

0.5% 

0.0358 

0.0% 

minimum 

0.0358 
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Test  Problem  2 


Normed  Distance 


Statistical  Evaluation 


Van  der  Waerden  Test  (Normal  Quantiles) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

75907090 

400 

-23.2475 

-0.05812 

-1.245 

75908090 

400 

43.8949 

0.10974 

2.351 

75957095 

400 

32.7966 

0.08199 

1.757 

75958095 

400 

38.3050 

0.09576 

2.052 

100907090 

400 

15.8731 

0.03968 

0.850 

100908090 

400 

-78.3039 

-0.19576 

-4.194 

100957095 

400 

-9.1368 

-0.02284 

-0.489 

100958095 

400 

-20.1814 

-0.05045 

-1.081 

1-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 
29.8357  7  0.0001 
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Quantiles 

mmi 

Moments 

100.0% 

maximum 

6.8750 

Mean 

3.120836 

99.5% 

6.8750 

Std  Dev 

1.174905 

97.5% 

6.0503 

Std  Err  Mean 

0.0587452 

90.0% 

4.6551 

upper  95%  Mean 

3.2363248 

75.0% 

quartile 

3.6124 

lower  95%  Mean 

3.0053471 

50.0% 

median 

2.9628 

N 

400 

25.0% 

quartile 

2.3671 

10.0% 

1 .7557 

2.5% 

1.1371 

0.5% 

0.6850 

0.0% 

minimum 

0.6850 
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Quantiles 

HI 

Moments 

mm 

100.0% 

maximum 

10.821 

Mean 

3.362894 

99.5% 

10.821 

Std  Dev 

1 .3960099 

97.5% 

6.438 

Std  Err  Mean 

0.0698005 

90.0% 

4.984 

upper  95%  Mean 

3.5001167 

75.0% 

quartile 

4.036 

lower  95%  Mean 

3.2256713 

50.0% 

median 

3.057 

N 

400 

25.0% 

quartile 

2.585 

10.0% 

1.928 

2.5% 

1.526 

0.5% 

1.166 

0.0% 

minimum 

1.166 
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Moments 

HHl 

100.0% 

maximum 

9.8823 

Mean 

3.3958317 

99.5% 

9.8823 

Std  Dev 

1.5631973 

97.5% 

7.3880 

Std  Err  Mean 

0.0781599 

90.0% 

5.8822 

upper  95%  Mean 

3.5494884 

75.0% 

quartile 

4.2522 

lower  95%  Mean 

3.2421751 

50.0% 

median 

3.0162 

N 

400 

25.0% 

quartile 

2.2711 

10.0% 

1 .8260 

2.5% 

1 .3250 

0.5% 

0.7527 

0.0% 

minimum 

0.7527 

0  1  2  3  4  5  6  7  8  9  10  11  1213 
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Ou.mtiles 

■II 

Moments 

mm 

100.0% 

maximum 

11.768 

Mean 

3.355773 

99.5% 

11.768 

Std  Dev 

1 .4864724 

97.5% 

7.093 

Std  Err  Mean 

0.0743236 

90.0% 

4.838 

upper  95%  Mean 

3.5018878 

75.0% 

quartile 

3.986 

lower  95%  Mean 

3.2096581 

50.0% 

median 

3.103 

N 

400 

25  0% 

quartile 

2.553 

10.0% 

1.785 

2.5% 

1.265 

0.5% 

0.893 

0.0% 

minimum 

0.893 
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On  o  utiles 


Moments 


100.0% 

maximum 

12.076 

Mean 

3.259396 

99.5% 

12.076 

Std  Dev 

1 .3838742 

97.5% 

5.867 

Std  Err  Mean 

0.0691937 

90.0% 

4.573 

upper  95%  Mean 

3.3954258 

75.0% 

quartile 

3.796 

lower  95%  Mean 

3.1233662 

50.0% 

median 

3.138 

N 

400 

25.0% 

quartile 

2.529 

10.0% 

1.646 

2.5% 

1.098 

0.5% 

0.682 

0.0% 

minimum 

0.682 
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Ou.intiles 

■nil 

Moments 

B 1 

100.0% 

maximum 

5.9819 

Mean 

2.9613943 

99.5% 

5.9819 

Std  Dev 

1.1120964 

97.5% 

5.3969 

Std  Err  Mean 

0.0556048 

90.0% 

4.4163 

upper  95%  Mean 

3.0707093 

75.0% 

quartile 

3.5972 

lower  95%  Mean 

2.8520793 

50.0% 

median 

2.7981 

N 

400 

25.0% 

quartile 

2.3808 

10.0% 

1 .6424 

2.5% 

0.8079 

0.5% 

0.4744 

0.0% 

minimum 

0.4744 
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Ou.intiles 

II 

Moments 

■Hi 

100.0% 

maximum 

7.7178 

Mean 

3.2219506 

99.5% 

7.7178 

Std  Dev 

1 .4033233 

97.5% 

7.3417 

Std  Err  Mean 

0.0701662 

90.0% 

4.8784 

upper  95%  Mean 

3.3598921 

75.0% 

quartile 

3.8744 

lower  95%  Mean 

3.084009 

50.0% 

median 

2.9138 

N 

400 

25.0% 

quartile 

2.2728 

10.0% 

1 .7867 

2.5% 

1 .2051 

0.5% 

1.1067 

0.0% 

minimum 

1.1067 
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Ou.intiles 

■■II 

Moments 

1 

100.0% 

maximum 

6.4566 

Mean 

3.1227936 

99.5% 

6.4566 

Std  Dev 

1.1193589 

97.5% 

5.7538 

Std  Err  Mean 

0.0559679 

90.0% 

4.5164 

upper  95%  Mean 

3.2328225 

75.0% 

quartile 

3.8744 

lower  95%  Mean 

3.0127647 

50.0% 

median 

2.9868 

N 

400 

25.0% 

quartile 

2.4124 

10.0% 

1 .7296 

2.5% 

1.3217 

0.5% 

1.1505 

0.0% 

minimum 

1.1505 
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Test  Problem  3 


Normed  Distance 


Statistical  Evaluation 


Van  der  Waerden  Test  (Normal  Quantiles) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

75907090 

400 

-9.6381 

-0.02410 

-0.516 

75908090 

400 

41.4832 

0.10371 

2.222 

75957095 

400 

38.1687 

0.09542 

2.044 

75958095 

400 

1.1646 

0.00291 

0.062 

100907090 

400 

20.0106 

0.05003 

1.072 

100908090 

400 

-37.3491 

-0.09337 

-2.001 

100957095 

400 

-27.1663 

-0.06792 

-1.455 

100958095 

400 

-26.6736 

-0.06668 

-1.429 

ll-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 

16.3606  7  0.0220 
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II 
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■B) 

100.0% 

maximum 

0.51564 

Mean 

0.1 96242 

99.5% 

0.51564 

Std  Dev 

0.1072782 

97.5% 

0.48936 

Std  Err  Mean 

0.0053639 

90.0% 

0.40945 

upper  95%  Mean 

0.2067871 

75.0% 

quartile 

0.21049 

lower  95%  Mean 

0.185697 

50.0% 

median 

0.16301 

N 

400 

25.0% 

quartile 

0.13237 

10.0% 

0.10364 

2.5% 

0.05036 

0.5% 

0.03797 

0.0% 

minimum 

0.03797 
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■1 

100.0% 

maximum 

0.58774 

Mean 

0.2182842 

99.5% 

0.58774 

Std  Dev 

0.1310385 

97.5% 

0.52423 

Std  Err  Mean 

0.0065519 

90.0% 

0.45620 

upper  95%  Mean 

0.2311648 

75.0% 

quartile 

0.24730 

lower  95%  Mean 

0.2054036 

50  0% 

median 

0.17017 

N 

400 

25.0% 

quartile 

0.13942 

10.0% 

0.10098 

2.5% 

0.04358 

0.5% 

0.03032 

0.0% 

minimum 

0.03032 
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maximum 

0.68686 

Mean 

0.2092138 

99.5% 

0.68686 

Std  Dev 

0.1228344 

97.5% 

0.48728 

Std  Err  Mean 

0.0061417 

90.0% 

0.44768 

upper  95%  Mean 

0.221288 

75.0% 

quartile 

0.24416 

lower  95%  Mean 

0.1971396 

50.0% 

median 

0.18218 

N 

400 

25.0% 

quartile 

0.13351 

10.0% 

0.07641 

2.5% 

0.04568 

0.5% 

0.03377 

0.0% 

minimum 

0.03377 
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100.0% 

maximum 

0.65357 

Mean 

0.2045575 

99.5% 

0.65357 

Std  Dev 

0.1286606 

97.5% 

0.54433 

Std  Err  Mean 

0.006433 

90.0% 

0.42291 

upper  95%  Mean 

0.2172044 

75.0% 

quartile 

0.23871 

lower  95%  Mean 

0.1919106 

50.0% 

median 

0.17269 

N 

400 

25.0% 

quartile 

0.14071 

10.0% 

0.06553 

2.5% 

0.03372 

0.5% 

0.02224 

0.0% 

minimum 

0.02224 
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100.0% 

maximum 

0.94237 

Mean 

0.2156365 

99.5% 

0.94237 

Std  Dev 

0.1 409493 

97.5% 

0  51748 

Std  Err  Mean 

0.0070475 

90.0% 

0.43428 

upper  95%  Mean 

0.2294913 

75.0% 

quartile 

0.25239 

lower  95%  Mean 

0.2017817 

50.0% 

median 

0.16578 

N 

400 

25.0% 

quartile 

0.1 31 69 

10.0% 

0  09512 

2.5% 

0.04891 

0.5% 

0.01648 

0.0% 

minimum 

0.01648 
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StdDev 
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97.5% 
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Std  Err  Mean 

0.0055778 

90.0% 

0.39710 

upper  95%  Mean 

0.2016514 

75.0% 

quartile 

0.20934 

lower  95%  Mean 

0.1797204 

50.0% 

median 

0.16298 

N 

400 

25  0% 

quartile 

0.12917 

10.0% 

0.08045 

2.5% 

0.03901 

0.5% 

0.03012 

0.0% 

minimum 

0.03012 
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Norm 


—m — 

t 

1  1  LLLI  ' 1 

TTm-lf 

L 

lml 

0  .1 

.2  .3  .4 

"  1 " 

.5 

|  1  1  1  |  1  II  |  1  M  |  1  1  1  | 

6  .7  .8  .9  1 

Quantiles 

II 

Moments 

100.0% 

maximum 

0.57902 

Mean 

0.2030319 

99.5% 

0.57902 

StdDev 

0.1225948 

97.5% 

0.45686 

Std  Err  Mean 

0.0061297 

90.0% 

0.42037 

upper  95%  Mean 

0.2150825 

75.0% 

quartile 

0.25284 

lower  95%  Mean 

0.1909813 

50.0% 

median 

0.16908 

N 

400 

25.0% 

quartile 

0.13039 

10.0% 

0.07162 

2.5% 

0.02676 

0.5% 

0.01768 

0.0% 

minimum 

0.01768 
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Norm 


— H— ,!*1’ 

1*  '  LLif  1 

r| 

ml 

i  mrmL 

1 

0  .1 

.2  .3  4 

■  1 1  ■  1 1 1  ■  ■  i ■  1. 1 1 ■  ■  1 1 1 1 1 

.5  .6  .7  .8  .9  1 

Oii.intiles 

II 

Moments 

Mi 

100.0% 

maximum 

0.50568 

Mean 

0.1999253 

99.5% 

0.50568 

Std  Dev 

0.1217489 

97.5% 

0.46484 

Std  Err  Mean 

0.0060874 

90.0% 

0.44251 

upper  95%  Mean 

0.2118927 

75.0% 

quartile 

0.20794 

lower  95%  Mean 

0.1879578 

50.0% 

median 

0.16723 

N 

400 

25.0% 

quartile 

0.12762 

10.0% 

0.08472 

2  5% 

0.02730 

0.5% 

0.01757 

0.0% 

minimum 

0.01757 
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A. 3  Main  Run  Data  Summary 

This  appendix  provides  the  details  of  the  results  of  the  main  run  used  to  determine  if 
the  use  of  surrogates  improves  the  quality  of  the  terminating  solution.  As  noted  in  Section 
4.4,  the  distance  to  the  optimal  point  was  used  as  the  quality  measure  for  each  design  set¬ 
ting.  For  each  of  the  test  problems,  the  box  plot  and  statistical  test  are  first  presented  and 
supported  by  the  following  individual  design  setting  histograms  and  descriptive  statistics. 
The  labeling  of  each  surrogate  design  is  provided  by  the  concatenation  of  the  surrogate 
used  with  boolean  indicators  for  the  use  of  a  local  trust  region  and  merit  function,  thus, 
KrigLoclMerO  represents  the  use  of  a  Kriging  surrogate  with  a  local  trust  region  but  not  a 
merit  function. 


107 


Test  Problem  1 


Combined  Norm 


Statistical  Evaluation 


Oiidiitiles 


Level 

10% 

25% 

Median 

75% 

90% 

KrigLocOMerO 

0.141793 

0.25027 

0.453448 

0.75915 

1 .05873 

KrigLocOMerl 

0.189633 

0.35205 

0.662344 

1 .073494 

1.491015 

KrigLocI  MerO 

0.126106 

0.249604 

0.38015 

0.739609 

1.144514 

KrigLocI  Merl 

0.357085 

0.674737 

1.418628 

2.564829 

4.1 80494 

NiM-Oc0Mer0 

0.131703 

0.23438 

0.308773 

0.530212 

0.927676 

WVLocOMerl 

0.132827 

0.226965 

0.342431 

0.614611 

1 .020827 

Ni/VLod  MerO 

0.244269 

0.462968 

0.798893 

1.137354 

1 .303267 

MA/Locl  Merl 

0.287717 

0.499921 

0.923669 

1 .328939 

1.6491 44 

None 

0.100629 

0.180662 

0.290058 

0.516337 

0.817699 

R&S  Design  100908090 


R&S  Design  100958095 
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Test  Problem  1 


RrigLocOMerO  and  KrigLoclMerO  Comparison 


Normed  Distance 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-MeanO)/StdO 

KrigLocOMeiO  300  92157.5  307.192  0.945 

KrigLoclMerO  300  88142.5  293.808  -0.945 

2-Sample  Test,  Normal  Approximation 

S  Z  Prob>  |Z| 

88142.5  -0.94532  0.3445 

rl-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 

0.8941  1  0.3444 


True  Objective  Function 


Statistical  Evaluation 


3.5- 

3- 

2.5- 

_  2- 

1.5- 

1  - 

0.5- 

0- 

3 

I 

KrigLocOMerO  KrigLocI  MerO 

Surrogate_Design 

Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-Mean0)/Std0 

KrigLocOMerO  300  92877  309.590  1.284 

KrigLoclMerO  300  87423  291.410  -1.284 

2-Sample  Test,  Normal  Approximation 

S  Z  Prob>  |Z| 

87423  -1.28421  0.1991 

1-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 
1.6498  1  0.1990 
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Test  Problem  1 


NWLocOMerO,  NWLocOMerl,  and  None  Comparison 


Normed  Distance 


Statistical  Evaluation 


NWLocOMerO  NWLocOMerl  None 


Surrogate_Design 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level 

Count 

Score  Sum 

Score  Mean  (Mean-Mean0)/Std0 

NWLocOMerO 

300 

136238.5 

454.128 

0.296 

NWLocOMerl 

300 

141824.5 

472.748 

1.815 

None 

300 

127387 

424.623 

-2.112 

M-way  Test,  ChiSquare  Approximation 

ChiSquare 

DF 

Prob>ChiSq 

5.2287 

2 

0.0732 

True  Objective  Function 


Statistical  Evaluation 


Wilcoxon/  Kruskal-Wallis  Tests  (Rank  Sums) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

NWLocOMerO 

300 

135851.5 

452.838 

0.191 

NWLocOMerl 

300 

138700.5 

462.335 

0.966 

None 

300 

130898 

436.327 

-1.156 

ll-way  Test,  ChiSquare  Approximation 

ChiSquare 

DF 

Prob>ChiSq 

1.5379 

2 

0.4635 
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Test  Problem  2 


Combined  Norm 


Statistical  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig  LocOMert) 

2.873339 

4.013382 

5.779123 

7.959836 

10.52179 

Krig  LocOMerl 

2.63661 

3.689953 

5.551306 

8.279745 

10.51901 

Krig  Loci  Me  rO 

2.805404 

3.99156 

5.802168 

7.982192 

9.77672 

Krig  Loci  Me  rl 

2.466074 

4.214961 

5.947904 

9.121276 

12.12097 

NWLocOMerO 

2.668076 

4.083172 

5.555668 

8.00502 

11.52101 

NWLocOMerl 

2.260656 

3.273001 

5.27244 

7.421511 

9.516558 

NWLocIMerO 

2.605152 

3.883431 

5.944353 

7.886601 

10.42115 

NWLocIMerl 

2.057595 

3.151391 

5.018045 

6.970823 

9.597397 

None 

2.196076 

2.605398 

3.124865 

3.764776 

4.357175 

R&S  Design  100958095 
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Test  Problem  2 


RrigLocOMerO  and  KrigLoclMerO  Comparison 


Normed  Distance 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-MeanO)/StdO 

KrigLocOMerO  300  90609  302.030  0.216 

KrigLoclMerO  300  89691  298.970  -0.216 

2-Sample  Test,  Normal  Approximation 

S  Z  Prob>|Z| 

89691  -0.21596  0.8290 

1-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 

0.0467  1  0.8288 


True  Objective  Function 


Statistical  Evaluation 


KrigLocOMerO  KrigLocI  MerO 


Surrogate_Design 
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Test  Problem  2 


NWLocOMerO  and  NWLocOMerl  Comparison 


Normed  Distance 


Statistical  Evaluation 


NWLocOMerO  NWLocOMerl 

Surrogate_Design 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-MeanO)/StdO 

NWLocOMerO  300  95154.5  317.182  2.357 

NWLocOMerl  300  85145.5  283.818  -2.357 

2-Sample  Test,  Normal  Approximation 

S  Z  Prot»|Z| 

85145.5  -2.35695  0.0184 

1-way  Test,  ChlSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 
5.5563  1  0.0184 


True  Objective  Function 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 


Count  Score  Sum  Score  Mean 
300  92768.5  309.228 

300  87531.5  291.772 


(Mean-Mean0)/Std0 

1.233 

-1.233 
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Test  Problem  3 


Combined  Norm 


R&S  Design  75907090 


Statistical  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Krig  LocOMe  rO 

0.151032 

0.231649 

0.388473 

0.439145 

0.46041  8 

Krig  LocOMe  rl 

0.08946 

0.134483 

0.164621 

0.209701 

0.282843 

Krig  Loci  Me  rO 

0.143989 

0.209848 

0.377724 

0.441052 

0.457555 

Krig  Loci  Me  rl 

0.209987 

0.468505 

1.936635 

1.967288 

1.990095 

NWLocOMerO 

0.099442 

0.140174 

0.16351 

0.205942 

0.346286 

NWLocOMerl 

0.100061 

0.139356 

0.167373 

0.211763 

0.415791 

NWLocIMerO 

0.095384 

0.14602 

0.178425 

0.413017 

0.483606 

NWLocIMerl 

0.45321 6 

0.617499 

0.799459 

1.309955 

1 .99575 

None 

0.098791 

0.13674 

0.168278 

0.20796 

0.322285 

R&S  Design  100908090 


R&S  Design  100958095 
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Test  Problem  3 


KrigLocOMerO  and  KrigLoclMerO  Comparison 


Normed  Distance 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-MeanO)/StdO 

Krig  LocOMe  it)  300  91710.5  305.702  0.735 

Krig Loci  Me lO  300  88589.5  295.298  -0.735 

2-Sample  Test,  Normal  Approximation 

S  Z  Prob>|Z| 

88589.5  -0.73478  0.4625 

(j-way  Test,  ChlSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 
0.5402  1  0.4623 


True  Objective  Function 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 

Level  Count  Score  Sum  Score  Mean  (Mean-Mean0)/Std0 

Krig  LocOMe  lO  300  91123.5  303.745  0.458 

Krig  Loci  Me  lO  300  89176.5  297.255  -0.458 

2-Sample  Test,  Normal  Approximation 

S  Z  Prob>|Z| 

89176.5  -0.45829  0.6467 

jj-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 
0.2103  1  0.6466 
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Test  Problem  3 


NWLocOMerO,  NWLocOMerl  and  None  Comparison 


Normed  Distance 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

NWLocOMerO 

300 

134265 

447.550 

-0.241 

NWLocOMerl 

300 

137605 

458.683 

0.668 

None 

300 

133580 

445.267 

-0.427 

1-way  Test,  ChiSquare  Approximation 

ChiSquare  DF  Prob>ChiSq 

0.4575  2  0.7955 


Surrogate_Design 


True  Objective  Function 


Statistical  Evaluation 


Wilcoxon  /  Kruskal-Wallis  Tests  (Rank  Sums) 


Level 

Count 

Score  Sum 

Score  Mean 

(Mean-Mean0)/Std0 

NWLocOMerO 

300 

137090 

456.967 

0.528 

NWLocOMerl 

300 

136225 

454.083 

0.292 

None 

300 

132135 

440.450 

-0.820 

M-wayTest,  ChiSquare  Approximation 

ChiSquare 

DF 

Prob>ChiSq 

0.691 1 

2 

0.7078 
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A. 4  R&S  Procedure  Adjustment  Results 


Test  Problem  1 


Original  R&S  Procedure  Modified  R&S  Procedure 


Level 

10% 

25% 

Median 

75% 

90% 

Level 

10% 

25% 

Median 

75% 

90% 

KrigLocO  MerO 

-0.92598 

-0.48503 

0.108955 

0.422495 

1.06597 

ModKrigLocOMerO 

-0.71699 

-0.48209 

-0.26268 

0.05637 

0.441054 

KrigLocOMerl 

-0.92956 

-0.3034 

0.296775 

1.209625 

6.00277 

ModKrigLocOMerl 

-0.72035 

-0.48262 

-0.20144 

0.212612 

2.54073 

KrigLocI  MerO 

-0.9601 1 

-0.60645 

0.075999 

0.42025 

1.16797 

ModKrigLocIMerO 

-0.76898 

-0.54724 

-0.28862 

-0.02475 

0.223169 

KrigLocI  Merl 

-0.22095 

0.203327 

2.2827 

21.342 

61.1114 

ModKrigLocIMerl 

-0.61419 

-0.38221 

-0.01834 

0.585703 

2.54866 

NWLocOMerO 

-1.03187 

-0.70592 

0.039461 

0.305915 

0.650844 

ModNWLocOMerO 

-0.74421 

-0.53765 

-0.31666 

-0.12487 

0.157835 

NWLocOMerl 

-0.9934 

-0.69384 

0.051803 

0.29853 

0.754418 

ModNWLocOMerl 

-0.70982 

-0.52729 

-0.31498 

-0.05775 

0.230275 

NW Loci  MerO 

-0.84534 

-0.04441 

0.309025 

0.798075 

1.38523 

ModNWLocI  MerO 

-0.59256 

-0.3051 4 

0.004371 

0.54728 

1 .27644 

NW Loci  Merl 

-0.84282 

-0.07153 

0.54272 

1.6254 

3.8815 

ModNWLocI  Merl 

-0.62639 

-0.40874 

-0.10498 

0.283445 

0.963726 

None 

-1.03203 

-0.63705 

0.056497 

0.271562 

0.5642 

ModNone 

-0.7617 

-0.55659 

-0.33704 

-0.09477 

0.21 3755 

Original  R&S  Procedure  Modified  R&S  Procedure 


Level 

10% 

25% 

Median 

75% 

90% 

Level 

10% 

25% 

Median 

75% 

90% 

KrigLocO  MerO 

0.01 8823 

0.0795 

0.190926 

0.526644 

1.316783 

ModKrigLocOMerO 

0.030646 

0.079851 

0.231983 

0.473079 

0.758084 

KrigLocOMerl 

0.045361 

0.130981 

0.401911 

1.372592 

26.00287 

ModKrigLocOMerl 

0.025844 

0.07554 

0.247169 

0.51 4493 

3.131908 

KrigLocI  MerO 

0.01  0442 

0.066388 

0.160579 

0.532767 

1.377032 

ModKrigLocIMerO 

0.024298 

0.071111 

0.165326 

0.372709 

0.731697 

KrigLocI  Merl 

0.104037 

0.349397 

2.862748 

357.4401 

604.1474 

ModKrigLocIMerl 

0.051976 

0.112076 

0.340956 

1 .024658 

4.294366 

NWLocOMerO 

0.011117 

0.047069 

0.136719 

0.349507 

0.80145 

ModNWLocOMerO 

0.02277 

0.063947 

0.147813 

0.299645 

0.500168 

NWLocOMerl 

0.009573 

0.052716 

0.134433 

0.41  9762 

1.070177 

ModNWLocOMerl 

0.01  8738 

0.067017 

0.15843 

0.31 3768 

0.607162 

NW  Loci  MerO 

0.071668 

0.168019 

0.47774 

1.019575 

2.076141 

ModNWLocI  MerO 

0.079722 

0.176804 

0.395631 

0.864855 

1 .71 4046 

NW  Loci  Merl 

0.073508 

0.224128 

0.690541 

1.903625 

6.811658 

ModNWLocI  Merl 

0.054815 

0.146886 

0.31  9635 

0.721275 

1.344146 

None 

0.007281 

0.032205 

0.137564 

0.332082 

0.855352 

ModNone 

0.01  7621 

0.048951 

0.147823 

0.298284 

0.559412 
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Test  Problem  2 


Original  R&S  Procedure 


Modified  R&S  Procedure 


Level 

10% 

25% 

Median 

75% 

90% 

KrigLocOMerO 

-0.99971 

-0.81612 

-0.28567 

0.535453 

1.27745 

KrigLocO  Merl 

-0.98078 

-0.69523 

0.027073 

0.97654 

3.04455 

KrigLocI  MerO 

-1.02275 

-0.77672 

-0.18204 

0.432145 

1.27745 

KrigLocI  Merl 

-0.74777 

-0.2231 1 

0.407095 

1.128875 

2.79799 

NWLocOMerO 

-1.04465 

-0.82505 

-0.31495 

0.438325 

1.18745 

NWLocOMerl 

-1.01137 

-0.77523 

-0.41217 

0.260085 

0.907198 

NWLocIMerO 

-0.98489 

-0.731 1 1 

-0.11371 

0.434955 

1.06983 

NWLocIMerl 

-1.04346 

-0.80376 

-0.03194 

0.347608 

0.789408 

None 

-1.21618 

-1.07108 

-0.90685 

-0.7281 6 

-0.50436 

Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

ModKrigLocOMerO 

-0.62332 

-0.43376 

-0.12279 

0.30105 

0.923555 

ModKrigLocOMerl 

-0.63772 

-0.35651 

-0.07594 

0.41 9698 

1.60339 

ModKrigLocIMerO 

-0.61278 

-0.40637 

-0.11949 

0.196968 

0.647549 

ModKrigLocIMerl 

-0.7032 

-0.43528 

-0.16037 

0.238968 

0.926588 

ModNWLocOMerO 

-0.64132 

-0.4281  7 

-0.12803 

0.231753 

0.691485 

ModNWLocOMerl 

-0.63451 

-0.40867 

-0.15459 

0.228718 

0.579323 

ModNWLocI  MerO 

-0.71671 

-0.46036 

-0.24761 

-0.00233 

0.223772 

ModNWLocI  Merl 

-0.71786 

-0.50881 

-0.26754 

-0.0155 

0.391129 

ModNone 

-0.81529 

-0.62446 

-0.4291 9 

-0.23596 

-0.05539 

Original  R&S  Procedure 


Modified  R&S  Procedure 


Level 

10% 

25% 

Median 

75% 

90% 

KrigLocO  MerO 

0.121571 

0.267207 

0.641923 

1.19316 

2.082607 

KrigLocO  Merl 

0.168234 

0.335471 

0.692988 

1.573814 

3.395072 

KrigLocI  MerO 

0.11768 

0.252543 

0.655294 

1.145067 

1.739752 

KrigLocI  Merl 

0.179431 

0.400148 

0.786821 

1.61 1666 

3.135825 

NWLocOMerO 

0.153532 

0.298775 

0.637065 

1.181302 

1.859797 

NWLocOMerl 

0.142914 

0.259851 

0.609259 

1.050145 

1.688589 

NWLocIMerO 

0.113322 

0.243978 

0.526066 

0.96028 

1.697488 

NWLocIMerl 

0.067966 

0.21 262 

0.38459 

0.742093 

1.393495 

None 

0.053698 

0.116962 

0.204723 

0.34481 1 

0.539051 

Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

ModKrigLocOMerO 

0.086734 

0.202793 

0.433352 

0.885896 

1.629481 

ModKrigLocOMerl 

0.083585 

0.21 8244 

0.551076 

1.036486 

1.942927 

ModKrigLocIMerO 

0.088778 

0.195346 

0.42009 

0.767944 

1.258362 

ModKrigLocIMerl 

0.083698 

0.176274 

0.348271 

0.723024 

1.455409 

ModNWLocOMerO 

0.091718 

0.176525 

0.383691 

0.71 9563 

1.185811 

ModNWLocOMerl 

0.09876 

0.186555 

0.371616 

0.752262 

1 .25283 

ModNWLocI  MerO 

0.053335 

0.138543 

0.311351 

0.542609 

0.907939 

ModNWLocI  Men 

0.057252 

0.123268 

0.252135 

0.531049 

0.969244 

ModNone 

0.046082 

0.075756 

0.133034 

0.227792 

0.367479 
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Test  Problem  3 


Original  R&S  Procedure 


Modified  R&S  Procedure 


Quantiles 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Level 

10% 

25% 

Median 

75% 

90% 

KrigLocOMerO 

-0.98024 

-0.63477 

0.063467 

0.206845 

0.474602 

ModKrigLocOMerO 

-0.74184 

-0.52022 

-0.2917 

-0.06141 

0.174166 

KrigLocOMerl 

-0.93332 

-0.443 

0.1088 

0.51 4598 

28.513 

ModKrigLocOMerl 

-0.7361 3 

-0.50058 

-0.2331 7 

0.048396 

2.84933 

KrigLocI  MerO 

-0.96404 

-0.67035 

0.062208 

0.263853 

0.522427 

ModKrigLocIMerO 

-0.67427 

-0.49466 

-0.3006 

-0.07138 

0.167302 

KrigLocI  Merl 

1.20548 

3.736325 

4.71 31 

7.03615 

10.3296 

ModKrigLocIMerl 

2.9276 

3.24875 

3.48945 

3.7756 

4.06281 

NWLocOMerO 

-0.88065 

-0.48029 

0.08402 

0.256855 

0.569916 

ModNWLocOMerf) 

-0.72207 

-0.50959 

-0.31232 

-0.09491 

0.205445 

NWLocOMerl 

-0.93165 

-0.58655 

0.074842 

0.2668 

0.51  0292 

ModNWLocOMerl 

-0.73923 

-0.50484 

-0.2781 4 

-0.05958 

0.203608 

NW Loci  MerO 

-0.86524 

-0.4751 8 

0.095374 

0.347477 

0.721095 

ModNWLocI  MerO 

-0.62331 

-0.43377 

-0.19472 

0.135865 

0.873579 

NW Loci  Merl 

-0.7328 

-0.31747 

0.362165 

1.3743 

3.82338 

ModNWLocI  Merl 

-0.5591 2 

-0.34954 

-0.1136 

0.139703 

0.733484 

None 

-0.90168 

-0.55789 

0.075631 

0.290478 

0.540055 

ModNone 

-0.81668 

-0.57386 

-0.35295 

-0.09683 

0.136145 

Original  R&S  Procedure 


Modified  R&S  Procedure 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

Level 

10% 

25% 

Median 

75% 

90% 

KrigLocOMerO 

0.083977 

0.136903 

0.224262 

0.412913 

0.71 3729 

ModKrigLocOMerO 

0.075942 

0.116269 

0.181236 

0.322895 

0.546881 

KrigLocOMerl 

0.060742 

0.135336 

0.346403 

0.772988 

28.51369 

ModKrigLocOMerl 

0.06924 

0.115305 

0.238733 

0.506078 

1.943926 

KrigLocI  MerO 

0.065662 

0.11535 

0.233313 

0.42132 

0.660879 

ModKrigLocIMerO 

0.077771 

0.129834 

0.208359 

0.39016 

0.581663 

KrigLocI  Merl 

1.453449 

4.023265 

4.91 4021 

7.322987 

10.35761 

ModKrigLocIMerl 

3.840532 

3.894317 

4.01 2275 

4.178733 

4.407357 

NWLocOMerO 

0.064632 

0.138931 

0.265122 

0.525214 

0.81 4552 

ModNWLocOMerO 

0.055592 

0.111516 

0.217124 

0.389035 

0.601266 

NWLocOMerl 

0.059321 

0.137515 

0.276099 

0.462575 

0.746003 

ModNWLocOMerl 

0.049932 

0.109708 

0.229312 

0.401238 

0.606474 

NW  Loci  MerO 

0.069852 

0.145779 

0.321323 

0.589814 

0.940543 

ModNWLocI  MerO 

0.079445 

0.133717 

0.284633 

0.491141 

1.396926 

NW  Loci  Merl 

0.190882 

0.31 9809 

0.598734 

1.437462 

3.995306 

ModNWLocI  Merl 

0.173232 

0.238003 

0.36267 

0.586544 

1.377932 

None 

0.051898 

0.118992 

0.249486 

0.501742 

0.847007 

ModNone 

0.043354 

0.086942 

0.187836 

0.356152 

0.507798 

119 


A. 5  Modified  Main  Run  Data  Summary 


Test  Problem  1 


Combined  Norm 


Surrogate_Design 


R&S  Design  75907090 


Statistical  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

ModKrigLocOMerO 

0.183082 

0.279589 

0.462733 

0.696756 

0.97981 

ModKrigLocOMerl 

0.183303 

0.289063 

0.512681 

0.846654 

1 .272726 

ModKrigLocI  MerO 

0.184885 

0.276269 

0.446392 

0.648276 

0.894085 

ModKrigLocI  Merl 

0.216324 

0.373294 

0.631855 

1.066242 

1.705945 

ModNWLocOMerO 

0.169529 

0.241604 

0.383253 

0.560456 

0.734748 

ModNWLocOMerl 

0.164824 

0.246897 

0.379482 

0.557653 

0.764497 

ModNWLocIMerO 

0.325078 

0.518189 

0.767995 

1.073243 

1.337804 

ModNWLocIMerl 

0.234517 

0.429105 

0.653339 

0.981983 

1.233132 

ModNone 

0.144165 

0.223778 

0.336995 

0.51306 

0.770634 

R&S  Design  100908090 


R&S  Design  100958095 
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Test  Problem  2 


Combined  Norm 


Statistical  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

ModKrigLocOMerO 

2.81051 

4.113336 

6.068051 

9.030545 

11.43659 

ModKrigLocOMerl 

2.585913 

3.983391 

6.76084 

9.375185 

12.49002 

ModKrigLocIMerO 

2.712457 

3.865991 

6.551751 

8.911808 

11.53997 

ModKrigLocIMerl 

2.411108 

3.936627 

6.042433 

8.523032 

11.51943 

ModNWLocOMerO 

2.705639 

4.508757 

7.173094 

10.00994 

13.44253 

ModNWLocOMerl 

2.811929 

4.430229 

6.852268 

9.626155 

12.01134 

ModNWLocIMerO 

2.457249 

4.099972 

6.249081 

8.716219 

10.79959 

ModNWLocIMerl 

2.406234 

3.893905 

6.000206 

9.219319 

12.11883 

ModNone 

1.768445 

2.523795 

3.093501 

3.766776 

4.763328 

R&S  Design  75907090 


R&S  Design  100908090 


R&S  Design  100958095 
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Test  Problem  3 


Combined  Norm 


R&S  Design  75907090 


Statistical  Evaluation 


Quantiles 


Level 

10% 

25% 

Median 

75% 

90% 

ModKrigLocOMerO 

0.131378 

0.199246 

0.355268 

0.433448 

0.474734 

ModKrigLocOMerl 

0.077898 

0.120507 

0.154298 

0.216067 

0.37532 

ModKrigLocI  MerO 

0.138444 

0.198441 

0.346302 

0.43218 

0.474837 

ModKrigLocI  Merl 

1.951534 

1.966005 

1.981919 

1.995479 

2.006684 

ModNWLocOMerO 

0.051673 

0.09949 

0.173924 

0.287299 

0.451325 

ModNWLocOMerl 

0.049727 

0.0954 

0.166327 

0.275793 

0.396389 

ModNW Loci  MerO 

0.081499 

0.154015 

0.266646 

0.464458 

1.211394 

ModNWLocIMerl 

0.441 

0.563356 

0.662058 

0.750048 

1.274063 

ModNone 

0.0839 

0.127854 

0.165218 

0.226187 

0.395346 

R&S  Design  100908090 


R&S  Design  100958095 
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